Robert Westerlund

A developer's blog on code, technology and tools in the Web, .NET and other development areas.


Converting Fixed Column Length Text Files into Objects using PowerShell

UPDATE 2014-03-24: Note that this version is written for PowerShell version 4 and will not work in PowerShell version 2.

I recently needed to work with a file containing data structured in fixed column width lines. The specification given to me could, for example, say that the first column, which is 9 characters, contains the First Name of a person, the second column, 10 characters, is the Last Name, the third column, 14 characters, is the country and finally the last two columns on each line is an abbreviation of the persons first and last name.

An example data file, using person information from different Wikipedia articles, could be something like the following

Ada    Lovelace  UK            al
Alfred Nobel     Sweden        an
GalileoGalilei   Italy         gg
Sophie KowalevskiRussian Empiresk
Thomas Edison    USA           te
Wang   Zhenyi    China         wz

Since I had several files with several different format, some of which had a first line which contained column headers, I wanted a reusable function for this parsing. The problem seems common enough to have a function built-in, but I did not know of such a function, so I created the ConvertFrom-FixedLengths function. If someone could point me to such a function, or a better approach, please let me know in the comments to this post.

Using the ConvertFrom-FixedLengths

The actual function script can be found last in this blog post, but here's a few usage samples.

Piping Get-Content data into the function

The scenario I wanted was just to pipe the result of Get-Content from the sample file above into the function. Given that we have a c:\data.txt file, containing the sample content above, just run the following command:

PS C:\> Get-Content .\data.txt | ConvertFrom-FixedLengths 7,10,14,2 -Trim | Format-Table

The Trim switch tells the function to remove any leading and trailing spaces on the property values. This command ended up providing the following output:

Column1                Column2                Column3                Column4
-------                -------                -------                -------
Ada                    Lovelace               UK                     al
Alfred                 Nobel                  Sweden                 an
Galileo                Galilei                Italy                  gg
Sophie                 Kowalevski             Russian Empire         sk
Thomas                 Edison                 USA                    te
Wang                   Zhenyi                 China                  wz

Using the first line to specify headers

In some of the files I had, the first line was a header line, so I wanted the resulting objects property names to be named after those headers. So, to test this, just create some sample data where the first line is a header line, and pipe that data into the function:

PS C:\> $peopleOrderedByFirstNameWithHeaders = @"
FirstNameLastName  Country       FL
Ada      Lovelace  UK            al
Alfred   Nobel     Sweden        an
Galileo  Galilei   Italy         gg
Sophie   KowalevskiRussian Empiresk
Thomas   Edison    USA           te
Wang     Zhenyi    China         wz
"@ -split [Environment]::NewLine

PS C:\> $peopleOrderedByFirstNameWithHeaders |
    ConvertFrom-FixedLengths 9,10,14,2 -Trim -HeadersFromFirstLine | 
FirstName              LastName               Country                FL
---------              --------               -------                --
Ada                    Lovelace               UK                     al
Alfred                 Nobel                  Sweden                 an
Galileo                Galilei                Italy                  gg
Sophie                 Kowalevski             Russian Empire         sk
Thomas                 Edison                 USA                    te
Wang                   Zhenyi                 China                  wz

The ConvertFrom-FixedLengths function

Here's the actual function I ended up with, toghether with comment based help for the function:

   Converts data from an array of fixed length column structured strings to objects with a property for each column.
   Expects one or more strings of fixed length column data and parses these strings into one object for each matching string. 
   The resulting object will have one property for each column in the data. The property names will be Column1, Column2, ..., ColumnN if the HeadersFromFirstLine switch is not used.
.PARAMETER InputObject
   Specifies the array of strings which is the data to be used. In an entry in this array does not fullfill the structure specified by the ColumnWidths parameter it will be ignored. For each string, the whole string, from start to end, must match the format in order for the string to be included.
.PARAMETER ColumnWidths
   Specifies the width for each of the columns in the input data. Each entry in the array passed in determines the number of characters expected in the column of the corresponding index.
   Specifies whether or not padding (leading or trailing) spaces should be removed from the property values.
.PARAMETER HeadersFromFirstLine
   Specifies whether or not the first entry in the InputObject array should be removed from the result list and instead be used to set the property names for the resulting objects.
   Piping an array of strings to this function will set the InputObject parameter. For additional help, see Get-Help ConvertFrom-FixedLengths -Parameter InputObject
   One object for each entry in the InputObject array which fits the format described by the ColumnWidths parameter. 

   The properties of the objects, if the HeadersFromFirstLine switch was not provided, will be named Column1, Column2, Column3, ..., ColumnN.

   If the HeadersFromFirstLine switch is provided, the properties will be named based on the names specified in the first line. If the first line does not conform to the format specified by the ColumnWidths property, the first line will be ignored and the properties will still be named Column1, Column2, ..., ColumnN.
   Author: Robert Westerlund
   Date:   2014-03-23  

    PS C:\> Get-Content .\data.txt | ConvertFrom-FixedLengths 9,10,14,2 | Format-Table

    Just pipe the result of the Get-Content cmdlet to this function and provide the column widths.


    PS C:\> Get-Content .\data.txt | ConvertFrom-FixedLengths 9,10,14,2 -Trim | Format-Table
    To remove padding spaces in the data, use the Trim switch.

    PS C:\> Get-Content .\data.txt | ConvertFrom-FixedLengths 9,10,14,2 -Trim -HeadersFromFirstLine | Format-Table

    If the first line in the data file is a header line use the HeadersFromFirstLine switch


    PS C:\>

    Below is an example where the data is a string array created in memory (sample names and locations taken from different Wikipedia articles).

    PS C:\> $peopleOrderedByFirstName = @"
    >> Ada    Lovelace  UK            al
    >> Alfred Nobel     Sweden        an
    >> GalileoGalilei   Italy         gg
    >> Sophie KowalevskiRussian Empiresk
    >> Thomas Edison    USA           te
    >> Wang   Zhenyi    China         wz
    >> "@ -split [Environment]::NewLine
    PS C:\> $peopleOrderedByFirstName | 
    >> ConvertFrom-FixedLengths 7,10,14,2 -Trim | 
    >> Format-Table


    PS C:\>

    This sample uses the first line of the data to set the property names for the resulting objects (sample names and locations taken from different Wikipedia articles).

    PS C:\> $peopleOrderedByFirstNameWithHeaders = @"
    >> FirstNameLastName  Country       FL
    >> Ada      Lovelace  UK            al
    >> Alfred   Nobel     Sweden        an
    >> Galileo  Galilei   Italy         gg
    >> Sophie   KowalevskiRussian Empiresk
    >> Thomas   Edison    USA           te
    >> Wang     Zhenyi    China         wz
    >> "@ -split [Environment]::NewLine

    PS C:\> $peopleOrderedByFirstNameWithHeaders |
    >> ConvertFrom-FixedLengths 9,10,14,2 -Trim -HeadersFromFirstLine | 
    >> Format-Table

function ConvertFrom-FixedLengths
        [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
        [Parameter(Mandatory = $true, Position = 0)]
        #Ensure that no columns are of width 0 or lower.
        $valuesLowerThanOrEqualToZero = $ColumnWidths | Where { $_ -le 0 }
        if ($valuesLowerThanOrEqualToZero.Length -gt 0)
            throw "The column width must be a value higher than 0. The value(s) '$($valuesLowerThanOrEqualToZero -join "', '")' is not valid for column width"

        #Build the regex string for parsing the columns
        $regex = '^'
        foreach($width in $ColumnWidths)
            $regex += "(.{$width})"
        $regex += '$'
        Write-Verbose "Using the regex '$regex' for parsing the string"

        #Create an array holding the header names we want to use, so we can get headers from the first line
        $haveHeadersBeenRetrieved = $false
        $columnWidthLength = $ColumnWidths.Length
        #We use the start index 1 since it feels more natural for property name
        $headers = (1..$columnWidthLength) | Foreach { "Column$_" }
        #Build a script block which will be used to create PSObjects from the input data
        $NewObjectScript = {

            $groups = $MatchInfo.Matches.Groups
            $properties = [ordered]@{}
            for($index = 0; $index -lt $columnWidthLength; $index ++)
                #The group at index 0 is the whole matching string, so the first capture group is at 1
                $value = $groups[$index + 1].Value
                if ($Trim)
                    $value = $value.Trim()
                $properties.($headers[$index]) = $value 

            Write-Output (New-Object PSCustomObject -Property $properties)
        if ($HeadersFromFirstLine -and -not $haveHeadersBeenRetrieved)
            #Remove the first line from the input object and use the regular expression to retrieve the headers from that line
            $InputObject | 
                Select -First 1 |
                Select-String -Pattern $regex |
                Foreach {
                    $groups = $_.Matches.Groups
                    $headers = (1..$columnWidthLength) | Foreach { 
            #Verify that no two headers are the same
            $duplicateHeaders = $headers | Where { $current = $_; @($Headers | Where {$_ -eq $current}).Length -gt 1 } | Get-Unique -AsString
            if ($duplicateHeaders)
                throw "Cannot use the same header for two columns. Duplicate header(s) found: $($duplicateHeaders -join ', ')"
            Write-Verbose "Using the following header(s) for property names: $($headers -join ', ')"

            #Ensure that we don't do this again and that the rest of the processing doesn't use the header line
            $haveHeadersBeenRetrieved = $true
            $InputObject = $InputObject | Select -Skip 1

        #Process the actual strings based on the regex created in the BEGIN scriptblock
        $InputObject | 
            Select-String -Pattern $regex |
            Foreach {& $NewObjectScript $_ }

Working with Git on remote computer using a PSDrive

I have a web site which I update using a git repository. I make my changes, push the changes to a git service provider (e.g. GitHub) and then pull the latest version on the server. However, I don't want to have it automatically pull any changes, but I want to manually trigger a pull at the times I want the site to update.

I wanted to avoid Remote Desktop and also wanted to stay in my favorite console; the PowerShell console. I couldn't do it by using ordinary PowerShell remoting, since the git pull will ask me for username and password, which won't work via a PowerShell session. What I ended up doing was setting up a server share (this could be the root admin share, if wanted) as a PSDrive and then run my git command from the PSDrive.

PS C:\> New-PSDrive -Name MyWebServer -PSProvider FileSystem -Root \\WebServer01\MyGitRootShare -Credential (Get-Credential MYDOMAIN\MyUserName)
PS C:\> cd MyWebServer:
PS MyWebServer:\>git pull origin master
Username for '': MyUserName
Password for '':

Using a PSDrive (mapped network drive) PowerShell was perfectly fine displaying the interactive prompt from Git, since it was executing git locally and not via remoting.

Git seemed slower working via a PSDrive than working locally, but that was expected and the difference wasn't noticable enough for me to care.

Creating a Composite DSC Configuration with Parameters

While answering a question on StackOverflow I realized that I wanted to write a longer guide on the matter, which might not be a good fit for the Question and Answer format of StackOverflow. So, why not make a blog post of it?

The question was about how to create a composite Desired State Configuration (DSC) resource which has parameters. It might be valuable to know that I'm no expert on Desired State Configuration and tried many different things before I tried one that works. There might be other ways to achieve the goal, but this is a description of at least one way. In the examples, I will use BaseConfig for config name and MyParameter for parameter name.

Parameters to a Composite Configuration which is an ordinary module – not suggested

When a Composite Configuration is created as an ordinary module (directly in the C:\Program Files\WindowsPowerShell\Modules\MyModuleName, for example), it seems like it behaves like an ordinary Cmdlet. This means, to pass parameters to the composite configuration I had to do:

Import-DscResource -ModuleName BaseConfig 

Node localhost 
    BaseConfig Common -MyParameter "My Parameter Value" 

Worth noting is that I don't know if this would behave as expected for a DSC resource in other matters or not.

Also, if you create your composite configuration as a base module instead of a DSCResource within a module you will have to explicitly import the module since DSC won't do it automatically for you.

A Composite Configuration which supports parameters the right way

Since C:\Program Files\WindowsPowerShell\Modules is a long path and also is the folder in which we will should create our composite resource for it to be available for all users, I will shorten the path in the scripts below to display as C:\...\Modules and then subfolders are displayed as they are normally

In order to create a composite configuration which supports parameters, I had to make a dummy container module. These could, of course, contain implementation, but I only created an empty psm1 file and a psd1 file pointing to that psm1 file, using the following commands.

PS C:\...\Modules> md MyDscResources
PS C:\...\Modules> cd .\MyDscResources
PS C:\...\Modules\MyDscResources> "" > MyDscResources.psm1
PS C:\...\Modules\MyDscResources> New-ModuleManifest -Path .\MyDscResources.psd1 -RootModule MyDscResources.psm1

After making the empty container module, create a DSC resource folder (the name of the folder must be DSCResources) and then a folder for the composite configuration.

PS C:\...\Modules\MyDscResources> md DSCResources
PS C:\...\Modules\MyDscResources> cd .\DSCResources
PS C:\...\Modules\MyDscResources\DSCResources> md BaseConfig
PS C:\...\Modules\MyDscResources\DSCResources> cd .\BaseConfig

When we have the folder for our DSC Resource Composite Configuration, we should create the .schema.psm1 file, which should contain the composite configuration, as well as a .psd1 file which points to the module file. Note that the .schema.psm1 part of the filename must be matched exactly, since the PSDesiredStateConfiguration module actually looks specifically for this hard coded file extension for composite resources. Enter the following script into the BaseConfig.schema.psm1 file:

Configuration BaseConfig
    PARAM (
    File RootFile1
        DestinationPath = "C:\FileAtRoot1.txt"
        Ensure = "Present"
        Contents = $FileContents
    File RootFile2
        DestinationPath = "C:\FileAtRoot2.txt"
        Ensure = "Present"
        Contents = $FileContents

Then create a module manifest for the composite configuration:

PS C:\...\Modules\MyDscResources\DSCResources\BaseConfig> New-ModuleManifest -Path .\BaseConfig.psd1 -RootModule BaseConfig.schema.psm1

That's all which is needed to create the composite configuration.

Test the Composite Configuration

In order to test that our composite configuration works as expected, we need to create a configuration which uses the composite configuration we have created. The folder location is no longer important, so let's create the configuration in a C:\temp folder

PS C:\Program Files\WindowsPowerShell\Modules\MyDscResources\DSCResources\BaseConfig> cd\
PS C:\> md temp
PS C:\> cd temp
PS C:\temp> notepad ConfigurationTest.ps1

Into this ConfigurationTest.ps1 file, enter the following script:

Configuration ConfigurationTest
    Import-DscResource -Name BaseConfig
    Node localhost
        BaseConfig Commoon
            FileContents = "This is the file contents we want"


Now that we have our composite configuration and also a configuration which uses this, let's execute the configuration script to create the MOF files and then use the Start-DscConfiguration cmdlet to let DSC execute the configuration:

PS C:\temp> .\ConfigurationTest.ps1
PS C:\temp> Start-DscConfiguration ConfigurationTest

Once DSC has completed executing the configuration we should have the expected files, C:\FileAtRoot1.txt and \FileAtRoot2.txt which both have the parameter value as contents.