Robert Westerlund

A developer's blog on code, technology and tools in the Web, .NET and other development areas.

NAVIGATION - SEARCH

Removing UTF8 BOM (Byte Order Mark) using PowerShell

The Byte Order Mark (BOM) is a unicode character used to automatically identify the endianness of a file or stream. This makes it possible for an application reading the file to automatically identify the encoding of the file and interpret the file correctly.

However, in some cases the BOM added to a file can cause problems, since it actually adds three additional bytes in the beginning of the file. To make removing these bytes easier, I wrote a PowerShell function which does that. Given a list of files it identifies if the files contain the UTF8 BOM (I only needed UTF8 BOM removal, but it should be easy to add identification of other BOMs) and, if it does, copies the content of the file, except the BOM, to a temp file,. Finally, it overwrites the original file with the temp file, effectively removing the starting UTF8 BOM from the file.

The function also supports the –WhatIf flag, which can be used to run the command without asking it to change anything,. Running the command with the –WhatIf flag makes it output information about which files passed to it contains the UTF8 BOM and thus should be processed.

As with most code found on the internet, I leave the code here without guarantees, do with it what you will. Observe, also, that the function does not work in PowerShell version 2, since it uses .NET methods which do not exist in older versions of .NET. It works fine in PowerShell version 4 and I haven't tested it in other versions.

<#
.Synopsis
   The function removes the UTF8 BOM from files passed to the function.
.PARAMETER File
    The file which should have the UTF8 BOM removed.
.EXAMPLE
   ls c:\myFiles -filter *.txt -Recurse | Remove-Utf8BOM -WhatIf

   Pipes all .txt files below the c:\myFiles folder to the function.

   The -WhatIf flag ensures that the function does not make any changes to any file, instead only outputting information regarding which files it would remove the UTF8 BOM from if run without the -WhatIf flag.
.EXAMPLE
   ls c:\myFiles -filter *.txt -Recurse | Remove-Utf8BOM -Verbose

   Pipes all .txt files below the c:\myFiles folder to the function.

   The -Verbose flag makes the function output the information about which files had their UTF8 BOM removed.
.INPUTS
   The files which should have the UTF8 BOM removed.
.NOTES
   Author: Robert Westerlund
   Date:   2014-12-27
#>
function Remove-Utf8BOM
{
    [CmdletBinding(SupportsShouldProcess = $true)]
    PARAM(
        [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
        [System.IO.FileInfo]$File
    )
    BEGIN
    {
        $byteBuffer = New-Object System.Byte[] 3
    }
    PROCESS
    {
        $reader = $File.OpenRead()
        $bytesRead = $reader.Read($byteBuffer, 0, 3)
        if ($bytesRead -eq 3 -and
            $byteBuffer[0] -eq 239 -and
            $byteBuffer[1] -eq 187 -and
            $byteBuffer[2] -eq 191)
        {
            if ($PSCmdlet.ShouldProcess($File.FullName, 'Removing UTF8 BOM'))
            {
                $tempFile = [System.IO.Path]::GetTempFileName()
                $writer = [System.IO.File]::OpenWrite($tempFile)
                $reader.CopyTo($writer)
                $writer.Dispose()
                $reader.Dispose()
                Move-Item -Path $tempFile -Destination $file.FullName -Force
            }
        }
        else
        {
            $reader.Dispose()
        }
    }
}

Comments (2) -

Hello,

Can you please explain how to pass single file to function?
I have single file in C:\Program Files\Zabbix\2.4.3\zabbix_agentd.conf and I need to remove BOM.

Reply

Robert Westerlund

@Rolands:

Regarding passing a single file to the function, it works just the same way as it would to pass several files. The current implementation only takes FileInfo objects (one could consider adding a parameter set with a string parameter to allow passing file names instead of file info objects), so you need to get a FileInfo object for that single file.

One way to achieve this would be to do it the same way as if it was many files, but ensuring that the Get-ChildItem cmdlet does not retrieve more than one item.

With your example, this would be:
ls 'c:\Program Files\Zabbix\2.4.3\zabbix_agentd.conf' | Remove-Utf8BOM

I hope it helps!

Reply

Add comment