使用 PowerShell 使用变量中的节点解析 XML 文件
亲爱的 Powershell 用户大家好,
我正在尝试解析 xml 文件,该文件的结构可能有所不同。因此,我想根据从变量接收的节点结构来访问节点值。
示例
#XML file
$xml = [xml] @'
<node1>
<node2>
<node3>
<node4>test1</node4>
</node3>
</node2>
</node1>
'@
直接访问值是可行的。
#access XML node directly -works-
$xml.node1.node2.node3.node4 # working <OK>
通过变量中的节点信息访问值不起作用。
#access XML node via path from variable -does not work-
$testnodepath = 'node1.node2.node3.node4'
$xml.$testnodepath # NOT working
$xml.$($testnodepath) # NOT working
有没有办法通过从变量接收节点信息来直接访问 XML 节点值?
PS:我知道,有一种通过 Selectnode 的方法,但我认为这是低效的,因为它基本上是搜索关键字。
#Working - but inefficient
$testnodepath = 'node1/node2/node3/node4'
$xml.SelectNodes($testnodepath)
我需要一种非常有效的方法来解析 XML 文件,因为我需要解析巨大的 XML 文件。有没有办法通过从变量接收节点结构来直接访问 $xml.node1.node2.node3.node4 形式的节点值?
Hello dear fellow Powershell users,
I'm trying to parse xml files, which can differ in structure. Therefore, I want to access the node values based on the node structure received from a variable.
Example
#XML file
$xml = [xml] @'
<node1>
<node2>
<node3>
<node4>test1</node4>
</node3>
</node2>
</node1>
'@
Accessing the values directly works.
#access XML node directly -works-
$xml.node1.node2.node3.node4 # working <OK>
Accessing the values via node information from variable does not work.
#access XML node via path from variable -does not work-
$testnodepath = 'node1.node2.node3.node4'
$xml.$testnodepath # NOT working
$xml.$($testnodepath) # NOT working
Is there a way to access the XML node values directly via receiving node information from a variable?
PS: I am aware, that there is a way via Selectnode, but I assume that is inefficient since it basically searching for keywords.
#Working - but inefficient
$testnodepath = 'node1/node2/node3/node4'
$xml.SelectNodes($testnodepath)
I need a very efficient way of parsing the XML file since I will need to parse huge XML files. Is there a way to directly access the node values in the form $xml.node1.node2.node3.node4 by receiving the node structure from a variable?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以使用
ExecutionContext ExpandString
为此:如果节点路径 (
$testnodepath
) 来自外部(例如参数),您可能需要 通过删除任何非字符来防止任何恶意代码注入一个单词字符或一个点 (.
):You might use the
ExecutionContext ExpandString
for this:If the node path (
$testnodepath
) comes from outside (e.g. a parameter), you might want to prevent any malicious code injections by striping of any character that is not a word character or a dot (.
):您可以将包含属性路径的字符串拆分为单独的名称,然后将它们一一取消引用:
You can split the string containing the property path into individual names and then dereference them 1 by 1:
下面介绍了一种内存友好的流方法,不需要将整个 XML 文档 (DOM) 加载到内存中。因此,即使内存装不下,您也可以解析非常大的 XML 文件。它还应该提高解析速度,因为我们可以简单地跳过我们不感兴趣的元素。为了实现这一点,我们使用
System.Xml.XmlReader
用于即时处理 XML 元素 ,同时它们是从 文件。我已将代码包装在可重用函数中:
这样调用:
给定这个输入 XML:
生成此输出:
实际上是函数输出可以像往常一样被管道命令处理或存储在数组中的对象:
注释:
Convert-Path
用于转换 PowerShell 路径(又名 PSPath),该路径可能是相对于 .NET 可以使用的绝对路径功能。这是必需的,因为 .NET 使用与 PowerShell 不同的当前目录,并且 PowerShell 路径可能采用 .NET 甚至无法理解的形式(例如Microsoft.PowerShell.Core\FileSystem: :C:\something.txt
)。
这样的空元素,因为对于此类元素我们不会进入EndElement
case 分支,这将使当前路径 ($curElemPath
) 无效(该元素不会再次从当前路径中删除)。The following presents a memory-friendly streaming approach, that doesn't require to load the whole XML document (DOM) into memory. So you could parse really huge XML files even if they don't fit into memory. It should also improve parsing speed as we can simply skip elements that we are not interested in. To accomplish this, we use
System.Xml.XmlReader
to process XML elements on-the-fly, while they are read from the file.I've wrapped the code in a reusable function:
Call it like this:
Given this input XML:
This output is produced:
Actually the function outputs objects which can be processed by pipeline commands as usual or be stored in an array:
Notes:
Convert-Path
is used to convert a PowerShell path (aka PSPath), which might be relative, to an absolute path that can be used by .NET functions. This is required because .NET uses a different current directory than PowerShell and a PowerShell path can be in a form that .NET doesn't even understand (e. g.Microsoft.PowerShell.Core\FileSystem::C:\something.txt
).<node/>
, because for such elements we don't enter theEndElement
case branch, which would render the current path ($curElemPath
) invalid (the element would not be removed from the current path again).我有与此类似的要求,但是,我的要求是使用变量设置引用节点的值。我们需要这种能力,以便我们可以拥有一个可以引用不同 psd1 文件并正确设置信息的脚本。硬编码路径意味着我们需要多个脚本来完成同一件事。正如你可以想象的那样,这是一场噩梦。
...
以下作品。
但是,这失败了:
$doc.$xml_path = xml_cfg.from_id
...
PowerShell 无法处理对对象的变量引用,这真是令人遗憾。使用变量引用对象在 Perl 中工作得很好,并且由于这些限制,我们无法将所有代码迁移到 PowerShell。
I have a similar requirement to this, however, my requirement is to set values referencing nodes using a variable. We need this ability so that we can have one script which can reference different psd1 files and set the information correctly. Hard coding paths mean we need multiple scripts to do the same thing. As you can imagine this is a nightmare.
...
The following works.
However, this fails:
$doc.$xml_path = xml_cfg.from_id
...
It is a real shame PowerShell cannot handle variable references to objects. Referencing objects using variables works fine in Perl and thanks to these sorts of limitations prevents us from migrating all our code to PowerShell.