从命令行合并多个 XML 文件

发布于 2024-12-29 10:03:58 字数 440 浏览 0 评论 0原文

我有几个 xml 文件。它们都具有相同的结构,但由于文件大小而被分割。所以,假设我有 A.xmlB.xmlC.xmlD.xml 并且想要使用命令行工具将它们组合/合并到combined.xml

A.xml

<products>
    <product id="1234"></product>
    ...
</products>

B.xml

<products>
  <product id="5678"></product>
  ...
</products>

I have several xml files. They all have the same structure, but were splitted due to file size. So, let's say I have A.xml, B.xml, C.xml and D.xml and want to combine/merge them to combined.xml, using a command line tool.

A.xml

<products>
    <product id="1234"></product>
    ...
</products>

B.xml

<products>
  <product id="5678"></product>
  ...
</products>

etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

无人接听 2025-01-05 10:03:58

高科技答案:

将此 Python 脚本另存为 xmlcombine.py

#!/usr/bin/env python
import sys
from xml.etree import ElementTree

def run(files):
    first = None
    for filename in files:
        data = ElementTree.parse(filename).getroot()
        if first is None:
            first = data
        else:
            first.extend(data)
    if first is not None:
        print(ElementTree.tostring(first))

if __name__ == "__main__":
    run(sys.argv[1:])

要合并文件,请运行:

python xmlcombine.py ?.xml > combined.xml

要进一步增强,请考虑使用:

  • chmod +x xmlcombine.py :
    允许您在命令行中省略 python

  • xmlcombine.py !(combined).xml >组合.xml
    收集除输出之外的所有 XML 文件,但需要 bash 的 extglob 选项

  • xmlcombine.py *.xml |海绵组合.xml
    也收集 combined.xml 中的所有内容,但需要 sponge 程序

  • import lxml.etree as ElementTree
    使用可能更快的 XML 解析器

High-tech answer:

Save this Python script as xmlcombine.py:

#!/usr/bin/env python
import sys
from xml.etree import ElementTree

def run(files):
    first = None
    for filename in files:
        data = ElementTree.parse(filename).getroot()
        if first is None:
            first = data
        else:
            first.extend(data)
    if first is not None:
        print(ElementTree.tostring(first))

if __name__ == "__main__":
    run(sys.argv[1:])

To combine files, run:

python xmlcombine.py ?.xml > combined.xml

For further enhancement, consider using:

  • chmod +x xmlcombine.py:
    Allows you to omit python in the command line

  • xmlcombine.py !(combined).xml > combined.xml:
    Collects all XML files except the output, but requires bash's extglob option

  • xmlcombine.py *.xml | sponge combined.xml:
    Collects everything in combined.xml as well, but requires the sponge program

  • import lxml.etree as ElementTree:
    Uses a potentially faster XML parser

小伙你站住 2025-01-05 10:03:58

xml_grep

http://search.cpan.org/dist/XML-Twig/工具/xml_grep/xml_grep

xml_grep --pretty_print 缩进 --wrap products --descr '' --cond
“产品”*.xml>组合.xml

  • --wrap :用给定标签包围/包装 xml 结果。 (此处:products
  • --cond :grep 的 xml 子树(此处:product

xml_grep

http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep

xml_grep --pretty_print indented --wrap products --descr '' --cond
"product" *.xml > combined.xml

  • --wrap : encloses/wraps the the xml result with the given tag. (here: products)
  • --cond : the xml subtree to grep (here: product)
不离久伴 2025-01-05 10:03:58

低技术简单的答案:

echo '<products>' > combined.xml
grep -vh '</\?products>\|<?xml' *.xml >> combined.xml
echo '</products>' >> combined.xml

限制:

  • 开始和结束标签需要独占一行。
  • 这些文件需要全部具有相同的外部标签。
  • 外部标签不得具有属性。
  • 文件的内部标签不得与外部标签匹配。
  • combined.xml 的所有当前内容都将被清除,而不是被包含在内。

这些限制中的每一个都可以解决,但并非所有限制都能轻松解决。

Low-tech simple answer:

echo '<products>' > combined.xml
grep -vh '</\?products>\|<?xml' *.xml >> combined.xml
echo '</products>' >> combined.xml

Limitations:

  • The opening and closing tags need to be on their own line.
  • The files need to all have the same outer tags.
  • The outer tags must not have attributes.
  • The files must not have inner tags that match the outer tags.
  • Any current contents of combined.xml will be wiped out instead of getting included.

Each of these limitations can be worked around, but not all of them easily.

故事↓在人 2025-01-05 10:03:58

合并两棵树包括识别哪些是相同的以及哪些应该被替换的任务。不幸的是,这并不明显。所涉及的语义比从源 XML 文档中推断出的语义要多。

考虑这样的情况:第一个文档具有中间层,其中多个元素具有相同的标签,但属性不同。第二个文档将一个属性添加到现有元素的中间级别,同时还添加了另一个子元素。人们必须了解语义。

<params>
...
<param><name>hello</name><value>world</value></param>
...
</params>

添加/合并:

<params>
   <param><name>hello</name><value>yellow submarine</value></param>
</params>

Merging 2 trees includes the task to identify what is identical and what should be replaced. Unfortunately, this is not obvious. There is more semantic involved than what can be inferred from the source XML documents.

Consider the case where the first document has a middle level with several elements having the same tag, but different attributes. The second document adds an attribute to that middle level to an existing element, but also another child to it. One has to know the semantic.

<params>
...
<param><name>hello</name><value>world</value></param>
...
</params>

add/merge:

<params>
   <param><name>hello</name><value>yellow submarine</value></param>
</params>
回首观望 2025-01-05 10:03:58

另一个非常有用的工具是 yq,其目标是 jq 用于 YAML、TOML 和 XML。

它可以通过 pip 安装,然后 xml 处理命令称为 xq

pip install yq
xq .products ?.xml --xml-output --xml-root=products > combined.xml

Another very helpful tool is yq, which aims to be jq for YAML, TOML and XML.

It can be installed via pip, the xml handling command is then called xq.

pip install yq
xq .products ?.xml --xml-output --xml-root=products > combined.xml
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文