Sorting and deleting XML document elements using PowerShell
Sorting and deleting XML document elements using PowerShell
I'm trying to organize an XML document that contains driver information. Here's an example of what I'm working with:
<?xml version="1.0" encoding="utf-8"?>
<IncludeFragment xmlns:p="http://schemas.microsoft.com/someschema">>
<FFUDriver>
<Component>
<Package>
<p:PackageName>Intel.Display.Driver</PackageName>
<p:PackageFeedName>Feed</PackageFeedName>
<p:Version>10.24.0.1638</Version>
<p:Flavor>release</Flavor>
</Package>
</Component>
</FFUDriver>
<FFUDriver>
<Component>
<Package>
<p:PackageName>Intel.Audio.Driver</PackageName>
<p:PackageFeedName>Feed</PackageFeedName>
<p:Flavor>release</Flavor>
<p:Version>10.24.0.1638</Version>
<p:CabName>Intel.Audio.cab</CabName>
</Package>
</Component>
</FFUDriver>
</IncludeFragment>
I need to sort each Packages' elements in the following order:
Some of the Packages' elements are already in the proper order, some aren't, as in my example XML code. Also each Package needs to be sorted in alphabetical order based on PackageName. I'm new to working with XML in PowerShell and I can't for the life of me figure out how to accomplish this.
The other requirement is to find and delete all the <CabName>
elements. I sort of figured that out. The code I have below unfortunately deletes all the child elements of a <Package>
element if one of its child elements is <CabName>
. I can't seem to figure out the syntax to select and delete only <CabName>
.
<CabName>
<Package>
<CabName>
<CabName>
$Path = 'C:Drivers.xml'
$xml = New-Object -TypeName XML
$xml.Load($Path)
$xml.SelectNodes('//Package[CabName]') | ForEach-Object {
$_.ParentNode.RemoveChild($_)
}
$xml.Save('C:Test.xml')
UPDATE: With the help of Ansgar Wiechers, here's the finished code. I updated my example XML data to include a namespace since some of the documents I work with contain them. The below code handles namespaces. I hope this helps anyone else with a similar problem/questions!
[CmdletBinding()]
Param
(
[Parameter(Mandatory = $True, Position = 0)]
[ValidateScript({
$_ = $_ -replace '"', ""
if (-Not (Test-Path -Path $_ -PathType Leaf))
{
Throw "`n `n$_ `n `nThe specified file or path does not exist. Check the file name and path, and then try again."
}
return $True
})]
[System.String]$XMLPath,
[Parameter(Mandatory = $False, Position = 1)]
[System.String]$nsPrefix = "p",
[Parameter(Mandatory = $False, Position = 2)]
[System.String]$nsURI = "http://schemas.microsoft.com/someschema"
)
# Remove quotes from full path name, if they are present
$XMLPath = $XMLPath -replace '"', ""
$xml = New-Object -TypeName XML
$xml.Load($XMLPath)
$ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
$ns.AddNamespace($nsPrefix, $nsURI)
# Delete all CabName elements
$xml.SelectNodes('//p:CabName', $ns) | ForEach-Object {
$_.ParentNode.RemoveChild($_) | Out-Null
}
# Sort each Package element's child nodes based on custom order
$SortList = 'p:PackageName', 'p:PackageFeedName', 'p:Version', 'p:Flavor'
$xml.SelectNodes('//Package') | ForEach-Object {
$parent = $_
$SortList | ForEach-Object {
$child = $parent.RemoveChild($parent.SelectSingleNode("./$_", $ns))
$parent.AppendChild($child)
}
} | Out-Null
# Sort each Package element in alphabetical order based on its child node PackageName
$PackageNameList = $xml.SelectNodes('//p:PackageName', $ns) | Select-Object -Expand '#text' | Sort-Object
$xml.SelectNodes('//IncludeFragment') | ForEach-Object {
$parent = $_
$PackageNameList | ForEach-Object {
$child = $parent.RemoveChild($parent.SelectSingleNode("./FFUDriver[Component/Package/p:PackageName/text()='$_']", $ns))
$parent.AppendChild($child)
}
} | Out-Null
$XMLPath = $XMLPath -replace ".xml", "_sorted.xml"
$xml.Save($XMLPath)
Write-Host "`nSorting complete. Sorted XML document saved under $XMLPath" -ForegroundColor Green
Consistency and neatness. The XML document is frequently modified and reviewed manually on my team.
– SyncErr0r
Jun 29 at 15:02
2 Answers
2
The code you have deletes all <Package>
nodes that have a child element <CabName>
, not just all child elements of such nodes. That's because //Package[CabName]
matches all <Package>
nodes that contain a <CabName>
child nodes. What you actually want to match are all <CabName>
nodes that have a <Package>
parent node.
<Package>
<CabName>
//Package[CabName]
<Package>
<CabName>
<CabName>
<Package>
$xml.SelectNodes('./Package/CabName') | ForEach-Object {
$_.ParentNode.RemoveChild($_) | Out-Null
}
Also, normally order of elements in an XML shouldn't matter, so sorting the elements is rather pointless. However, if you for some reason you must have child nodes in a particular order you can sort the elements by removing and appending them in the desired order.
# names of the child nodes in the desired order
$nodenames = 'PackageName', 'PackageFeedName', 'Version', 'Flavor'
$xml.SelectNodes('//Package') | ForEach-Object {
$parent = $_
$nodenames | ForEach-Object {
$child = $parent.RemoveChild($parent.SelectSingleNode("./$_"))
$parent.AppendChild($child)
}
}
If you also want the <Driver>
nodes sorted by package name you first need to build a sorted list of package names:
<Driver>
$xml.SelectNodes('//PackageName') | Select-Object -Expand '#text' | Sort-Object
and then use the same technique as above to remove and append the <Driver>
nodes from/to the <Drivers>
node. In this case you need must use a filter pattern, though
<Driver>
<Drivers>
"./Driver[Component/Package/PackageName/text()='$_']"
Fantastic response. Your code works and I'm starting to understand the syntax. Unfortunately I still can't seem to figure out the logic for sorting each Package by its PackageName element. I updated my post to include the non-functional code for the last section that's supposed to sort each Package element by its child node PackageName. Would you tell me what's wrong with it? Your help is much appreciated :)
– SyncErr0r
Jul 3 at 4:11
@SyncErr0r Please take a look at your XML data. Do the
<Package>
nodes have a child node <Driver>
? If not, why would you expect ./Driver
to work when operating on <Package>
nodes? .
in an XPath expression represents the current node. For sorting the <Package>
nodes $parent
must be the parent element of the <Driver>
nodes. You cannot sort the <Package>
nodes directly, because each of them is nested in other elements.– Ansgar Wiechers
Jul 3 at 7:37
<Package>
<Driver>
./Driver
<Package>
.
<Package>
$parent
<Driver>
<Package>
That's what I get for scripting while half asleep. I got everything working. I added the working code to my original post. I also updated the XML example to match the code. Thanks again for all your help :)
– SyncErr0r
Jul 6 at 22:51
XML converting is not needed for this job:
$xml = @"
<?xml version="1.0" encoding="utf-8"?>
<Drivers>
<Driver>
<Component>
<Package>
<PackageName>Intel.Display.Driver</PackageName>
<PackageFeedName>Feed</PackageFeedName>
<Version>10.24.0.1638</Version>
<Flavor>release</Flavor>
</Package>
</Component>
</Driver>
<Driver>
<Component>
<Package>
<PackageName>Intel.Audio.Driver</PackageName>
<PackageFeedName>Feed</PackageFeedName>
<Flavor>release</Flavor>
<Version>10.24.0.1638</Version>
<CabName>Intel.Audio.cab</CabName>
</Package>
</Component>
</Driver>
</Drivers>
"@
$XMLSorted = [System.Text.StringBuilder]::new()
$packageName = ''
$packageFeedName = ''
$version = ''
$flavor = ''
foreach( $line in @($xml -split [Environment]::NewLine) ) {
if( $line -like '*<PackageName>*' ) {
$packageName = $line
}
elseif( $line -like '*<PackageFeedName>*' ) {
$packageFeedName = $line
}
elseif( $line -like '*<Version>*' ) {
$version = $line
}
elseif( $line -like '*<Flavor>*' ) {
$flavor = $line
}
elseif( $line -like '*<CabName>*' ) {
# nothing to do
}
elseif( $line -like '*</Package>*' ) {
[void]$XMLSorted.AppendLine( $packageName )
[void]$XMLSorted.AppendLine( $packageFeedName )
[void]$XMLSorted.AppendLine( $version )
[void]$XMLSorted.AppendLine( $flavor )
[void]$XMLSorted.AppendLine( $line )
}
else {
[void]$XMLSorted.AppendLine( $line )
}
}
#Result:
$XMLSorted.ToString()
Thanks for the quick response! Works great. I forgot to add that each Package needs to be sorted in alphabetical order based on PackageName.
– SyncErr0r
Jun 29 at 6:17
Do NOT use string operations for editing XML, particularly not if you already have an XML parser at your disposal. One XML tag could span more than one line, and several tags could be on the same line as well, in which case this code would produce incorrect results.
– Ansgar Wiechers
Jun 29 at 7:39
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Why do you think you need to sort the nested elements? In general order doesn't matter with XML elements.
– Ansgar Wiechers
Jun 29 at 7:41