使用 PowerShell 使用变量中的节点解析 XML 文件

发布于 2025-01-09 12:34:45 字数 1024 浏览 0 评论 0原文

亲爱的 Powershell 用户大家好,

我正在尝试解析 xml 文件,该文件的结构可能有所不同。因此,我想根据从变量接收的节点结构来访问节点值。

示例

#XML file
$xml = [xml] @'
<node1>
    <node2>
        <node3>
            <node4>test1</node4>
        </node3>
    </node2>
</node1>
'@

直接访问值是可行的。

#access XML node directly -works-
$xml.node1.node2.node3.node4        # working <OK>

通过变量中的节点信息访问值不起作用。

#access XML node via path from variable -does not work-
$testnodepath = 'node1.node2.node3.node4'

$xml.$testnodepath                  # NOT working
$xml.$($testnodepath)               # NOT working

有没有办法通过从变量接收节点信息来直接访问 XML 节点值?

PS:我知道,有一种通过 Selectnode 的方法,但我认为这是低效的,因为它基本上是搜索关键字。

#Working - but inefficient
$testnodepath = 'node1/node2/node3/node4'
$xml.SelectNodes($testnodepath)

我需要一种非常有效的方法来解析 XML 文件,因为我需要解析巨大的 XML 文件。有没有办法通过从变量接收节点结构来直接访问 $xml.node1.node2.node3.node4 形式的节点值?

Hello dear fellow Powershell users,

I'm trying to parse xml files, which can differ in structure. Therefore, I want to access the node values based on the node structure received from a variable.

Example

#XML file
$xml = [xml] @'
<node1>
    <node2>
        <node3>
            <node4>test1</node4>
        </node3>
    </node2>
</node1>
'@

Accessing the values directly works.

#access XML node directly -works-
$xml.node1.node2.node3.node4        # working <OK>

Accessing the values via node information from variable does not work.

#access XML node via path from variable -does not work-
$testnodepath = 'node1.node2.node3.node4'

$xml.$testnodepath                  # NOT working
$xml.$($testnodepath)               # NOT working

Is there a way to access the XML node values directly via receiving node information from a variable?

PS: I am aware, that there is a way via Selectnode, but I assume that is inefficient since it basically searching for keywords.

#Working - but inefficient
$testnodepath = 'node1/node2/node3/node4'
$xml.SelectNodes($testnodepath)

I need a very efficient way of parsing the XML file since I will need to parse huge XML files. Is there a way to directly access the node values in the form $xml.node1.node2.node3.node4 by receiving the node structure from a variable?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

伴梦长久 2025-01-16 12:34:45

您可以使用 ExecutionContext ExpandString 为此:

$ExecutionContext.InvokeCommand.ExpandString("`$(`$xml.$testnodepath)")
test1

如果节点路径 ($testnodepath) 来自外部(例如参数),您可能需要 通过删除任何非字符来防止任何恶意代码注入一个单词字符或一个点 (.):

$securenodepath = $testnodepath -Replace '[^\w\.]'
$ExecutionContext.InvokeCommand.ExpandString("`$(`$xml.$securenodepath)")

You might use the ExecutionContext ExpandString for this:

$ExecutionContext.InvokeCommand.ExpandString("`$(`$xml.$testnodepath)")
test1

If the node path ($testnodepath) comes from outside (e.g. a parameter), you might want to prevent any malicious code injections by striping of any character that is not a word character or a dot (.):

$securenodepath = $testnodepath -Replace '[^\w\.]'
$ExecutionContext.InvokeCommand.ExpandString("`$(`$xml.$securenodepath)")
旧时光的容颜 2025-01-16 12:34:45

您可以将包含属性路径的字符串拆分为单独的名称,然后将它们一一取消引用:

# define path
$testnodepath = 'node1.node2.node3.node4'

# create a new variable, this will be our intermediary for keeping track of each node/level we've resolved so far
$target = $xml

# now we just loop through each node name in the path
foreach($nodeName in $testnodepath.Split('.')){
  # keep advancing down through the path, 1 node name at a time
  $target = $target.$nodeName
}

# this now resolves to the same value as `$xml.node1.node2.node3.node4`
$target

You can split the string containing the property path into individual names and then dereference them 1 by 1:

# define path
$testnodepath = 'node1.node2.node3.node4'

# create a new variable, this will be our intermediary for keeping track of each node/level we've resolved so far
$target = $xml

# now we just loop through each node name in the path
foreach($nodeName in $testnodepath.Split('.')){
  # keep advancing down through the path, 1 node name at a time
  $target = $target.$nodeName
}

# this now resolves to the same value as `$xml.node1.node2.node3.node4`
$target
嘿嘿嘿 2025-01-16 12:34:45

我需要解析巨大的 XML 文件

下面介绍了一种内存友好的方法,不需要将整个 XML 文档 (DOM) 加载到内存中。因此,即使内存装不下,您也可以解析非常大的 XML 文件。它还应该提高解析速度,因为我们可以简单地跳过我们不感兴趣的元素。为了实现这一点,我们使用 System.Xml.XmlReader 用于即时处理 XML 元素 ,同时它们是从 文件。

我已将代码包装在可重用函数中:

Function Import-XmlElementText( [String] $FilePath, [String[]] $ElementPath ) {

    $stream = $reader = $null

    try {
        $stream = [IO.File]::OpenRead(( Convert-Path -LiteralPath $FilePath )) 
        $reader = [System.Xml.XmlReader]::Create( $stream )

        $curElemPath = ''  # The current location in the XML document

        # While XML nodes are read from the file
        while( $reader.Read() ) {
            switch( $reader.NodeType ) {
                ([System.Xml.XmlNodeType]::Element) {
                    if( -not $reader.IsEmptyElement ) {
                        # Start of a non-empty element -> add to current path
                        $curElemPath += '/' + $reader.Name
                    }
                }
                ([System.Xml.XmlNodeType]::Text) {
                    # Element text -> collect if path matches
                    if( $curElemPath -in $ElementPath ) {
                        [PSCustomObject]@{
                            Path  = $curElemPath
                            Value = $reader.Value
                        }
                    }
                }
                ([System.Xml.XmlNodeType]::EndElement) {
                    # End of element - remove current element from the path
                    $curElemPath = $curElemPath.Substring( 0, $curElemPath.LastIndexOf('/') ) 
                }
            }
        }
    }
    finally {
        if( $reader ) { $reader.Close() }
        if( $stream ) { $stream.Close() }
    }
}

这样调用:

Import-XmlElementText -FilePath test.xml -ElementPath '/node1/node2a/node3a', '/node1/node2b'

给定这个输入 XML

<node1>
    <node2a>
        <node3a>test1</node3a>
        <node3b/>
        <node3c a='b'/>
        <node3d></node3d>
    </node2a>
    <node2b>test2</node2b>
</node1>

生成此输出

Path                 Value
----                 -----
/node1/node2a/node3a test1
/node1/node2b        test2

实际上是函数输出可以像往常一样被管道命令处理或存储在数组中的对象:

$foundElems = Import-XmlElementText -FilePath test.xml -ElementPath '/node1/node2a/node3a', '/node1/node2b'

$foundElems[1].Value  # Prints 'test2'

注释:

  • Convert-Path 用于转换 PowerShell 路径(又名 PSPath),该路径可能是相对于 .NET 可以使用的绝对路径功能。这是必需的,因为 .NET 使用与 PowerShell 不同的当前目录,并且 PowerShell 路径可能采用 .NET 甚至无法理解的形式(例如 Microsoft.PowerShell.Core\FileSystem: :C:\something.txt)。
  • 当遇到元素的开始时,我们必须跳过诸如 这样的空元素,因为对于此类元素我们不会进入 EndElement case 分支,这将使当前路径 ($curElemPath) 无效(该元素不会再次从当前路径中删除)。

I will need to parse huge XML files

The following presents a memory-friendly streaming approach, that doesn't require to load the whole XML document (DOM) into memory. So you could parse really huge XML files even if they don't fit into memory. It should also improve parsing speed as we can simply skip elements that we are not interested in. To accomplish this, we use System.Xml.XmlReader to process XML elements on-the-fly, while they are read from the file.

I've wrapped the code in a reusable function:

Function Import-XmlElementText( [String] $FilePath, [String[]] $ElementPath ) {

    $stream = $reader = $null

    try {
        $stream = [IO.File]::OpenRead(( Convert-Path -LiteralPath $FilePath )) 
        $reader = [System.Xml.XmlReader]::Create( $stream )

        $curElemPath = ''  # The current location in the XML document

        # While XML nodes are read from the file
        while( $reader.Read() ) {
            switch( $reader.NodeType ) {
                ([System.Xml.XmlNodeType]::Element) {
                    if( -not $reader.IsEmptyElement ) {
                        # Start of a non-empty element -> add to current path
                        $curElemPath += '/' + $reader.Name
                    }
                }
                ([System.Xml.XmlNodeType]::Text) {
                    # Element text -> collect if path matches
                    if( $curElemPath -in $ElementPath ) {
                        [PSCustomObject]@{
                            Path  = $curElemPath
                            Value = $reader.Value
                        }
                    }
                }
                ([System.Xml.XmlNodeType]::EndElement) {
                    # End of element - remove current element from the path
                    $curElemPath = $curElemPath.Substring( 0, $curElemPath.LastIndexOf('/') ) 
                }
            }
        }
    }
    finally {
        if( $reader ) { $reader.Close() }
        if( $stream ) { $stream.Close() }
    }
}

Call it like this:

Import-XmlElementText -FilePath test.xml -ElementPath '/node1/node2a/node3a', '/node1/node2b'

Given this input XML:

<node1>
    <node2a>
        <node3a>test1</node3a>
        <node3b/>
        <node3c a='b'/>
        <node3d></node3d>
    </node2a>
    <node2b>test2</node2b>
</node1>

This output is produced:

Path                 Value
----                 -----
/node1/node2a/node3a test1
/node1/node2b        test2

Actually the function outputs objects which can be processed by pipeline commands as usual or be stored in an array:

$foundElems = Import-XmlElementText -FilePath test.xml -ElementPath '/node1/node2a/node3a', '/node1/node2b'

$foundElems[1].Value  # Prints 'test2'

Notes:

  • Convert-Path is used to convert a PowerShell path (aka PSPath), which might be relative, to an absolute path that can be used by .NET functions. This is required because .NET uses a different current directory than PowerShell and a PowerShell path can be in a form that .NET doesn't even understand (e. g. Microsoft.PowerShell.Core\FileSystem::C:\something.txt).
  • When encountering start of an element, we have to skip empty elements such as <node/>, because for such elements we don't enter the EndElement case branch, which would render the current path ($curElemPath) invalid (the element would not be removed from the current path again).
人间☆小暴躁 2025-01-16 12:34:45

我有与此类似的要求,但是,我的要求是使用变量设置引用节点的值。我们需要这种能力,以便我们可以拥有一个可以引用不同 psd1 文件并正确设置信息的脚本。硬编码路径意味着我们需要多个脚本来完成同一件事。正如你可以想象的那样,这是一场噩梦。

...
以下作品。

[XML]$doc = Get-Content $my_xml_file
$xml_cfg = Import-LocalizedData = xml_information.psd1
$xml_path = "FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id"
$doc.FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id = $xml_cfg.from_id

但是,这失败了:
$doc.$xml_path = xml_cfg.from_id

ERROR: "The property 'FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id' cannot be found on this object. Verify that the property exists and can be set."

...

PowerShell 无法处理对对象的变量引用,这真是令人遗憾。使用变量引用对象在 Perl 中工作得很好,并且由于这些限制,我们无法将所有代码迁移到 PowerShell。

I have a similar requirement to this, however, my requirement is to set values referencing nodes using a variable. We need this ability so that we can have one script which can reference different psd1 files and set the information correctly. Hard coding paths mean we need multiple scripts to do the same thing. As you can imagine this is a nightmare.

...
The following works.

[XML]$doc = Get-Content $my_xml_file
$xml_cfg = Import-LocalizedData = xml_information.psd1
$xml_path = "FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id"
$doc.FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id = $xml_cfg.from_id

However, this fails:
$doc.$xml_path = xml_cfg.from_id

ERROR: "The property 'FinData.Header.Hdrinfo.From.CpnyId.Id.StoreId.Report.Id' cannot be found on this object. Verify that the property exists and can be set."

...

It is a real shame PowerShell cannot handle variable references to objects. Referencing objects using variables works fine in Perl and thanks to these sorts of limitations prevents us from migrating all our code to PowerShell.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文