Robert Westerlund

A developer's blog on code, technology and tools in the Web, .NET and other development areas.

NAVIGATION - SEARCH

Removing UTF8 BOM (Byte Order Mark) using PowerShell

The Byte Order Mark (BOM) is a unicode character used to automatically identify the endianness of a file or stream. This makes it possible for an application reading the file to automatically identify the encoding of the file and interpret the file correctly.

However, in some cases the BOM added to a file can cause problems, since it actually adds three additional bytes in the beginning of the file. To make removing these bytes easier, I wrote a PowerShell function which does that. Given a list of files it identifies if the files contain the UTF8 BOM (I only needed UTF8 BOM removal, but it should be easy to add identification of other BOMs) and, if it does, copies the content of the file, except the BOM, to a temp file,. Finally, it overwrites the original file with the temp file, effectively removing the starting UTF8 BOM from the file.

The function also supports the –WhatIf flag, which can be used to run the command without asking it to change anything,. Running the command with the –WhatIf flag makes it output information about which files passed to it contains the UTF8 BOM and thus should be processed.

As with most code found on the internet, I leave the code here without guarantees, do with it what you will. Observe, also, that the function does not work in PowerShell version 2, since it uses .NET methods which do not exist in older versions of .NET. It works fine in PowerShell version 4 and I haven't tested it in other versions.

<#
.Synopsis
   The function removes the UTF8 BOM from files passed to the function.
.PARAMETER File
    The file which should have the UTF8 BOM removed.
.EXAMPLE
   ls c:\myFiles -filter *.txt -Recurse | Remove-Utf8BOM -WhatIf

   Pipes all .txt files below the c:\myFiles folder to the function.

   The -WhatIf flag ensures that the function does not make any changes to any file, instead only outputting information regarding which files it would remove the UTF8 BOM from if run without the -WhatIf flag.
.EXAMPLE
   ls c:\myFiles -filter *.txt -Recurse | Remove-Utf8BOM -Verbose

   Pipes all .txt files below the c:\myFiles folder to the function.

   The -Verbose flag makes the function output the information about which files had their UTF8 BOM removed.
.INPUTS
   The files which should have the UTF8 BOM removed.
.NOTES
   Author: Robert Westerlund
   Date:   2014-12-27
#>
function Remove-Utf8BOM
{
    [CmdletBinding(SupportsShouldProcess = $true)]
    PARAM(
        [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
        [System.IO.FileInfo]$File
    )
    BEGIN
    {
        $byteBuffer = New-Object System.Byte[] 3
    }
    PROCESS
    {
        $reader = $File.OpenRead()
        $bytesRead = $reader.Read($byteBuffer, 0, 3)
        if ($bytesRead -eq 3 -and
            $byteBuffer[0] -eq 239 -and
            $byteBuffer[1] -eq 187 -and
            $byteBuffer[2] -eq 191)
        {
            if ($PSCmdlet.ShouldProcess($File.FullName, 'Removing UTF8 BOM'))
            {
                $tempFile = [System.IO.Path]::GetTempFileName()
                $writer = [System.IO.File]::OpenWrite($tempFile)
                $reader.CopyTo($writer)
                $writer.Dispose()
                $reader.Dispose()
                Move-Item -Path $tempFile -Destination $file.FullName -Force
            }
        }
        else
        {
            $reader.Dispose()
        }
    }
}