使用 lxml 保留子元素名称空间序列化
我有一些不同的 XML 文档,我正在尝试使用 lxml 将它们合并为一个。问题是我需要结果来保留每个子文档根节点上的名称空间。 Lxml 似乎想要将多次使用的任何命名空间声明推送到新文档的根目录,这会破坏我的应用程序(这是一个公认的错误)。
例如,我有文档 A:
<dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
<title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</title>
</dc>
和文档 B:
<mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<titleInfo>
<nonSort>La</nonSort>
<title>difesa della razza</title>
<subTitle>scienza, documentazione, polemica</subTitle>
<partNumber>anno 1:n. 1</partNumber>
</titleInfo>
</mods>
我想将它们包装在一个也使用 xsi:schemaLocation 的元素中,但我需要命名空间声明 (xmlns:xsi="http://www.w3.org /2001/XMLSchema-instance") 出现在所有三个节点中,如下所示:
<wrap xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org" xmlns:dc="http://www.foo.org" xmlns:mods="http://www.bar.org">
<dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
<dc:title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</dc:title>
</dc:dc>
<mods:mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods:titleInfo>
<mods:nonSort>La</mods:nonSort>
<mods:title>difesa della razza</mods:title>
<mods:subTitle>scienza, documentazione, polemica</mods:subTitle>
<mods:partNumber>anno 1:n. 1</mods:partNumber>
</mods:titleInfo>
</mods:mods>
</wrap>
但是,当我使用 Python/lxml 附加这两个文档时,
wrap.append(dc)
wrap.append(mods)
我将声明推到使用它的最高级别节点。不幸的是,这对我的应用程序来说是一个问题。像这样:
<wrap xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org" xmlns:dc="http://www.foo.org" xmlns:mods="http://www.bar.org">
<dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
<dc:title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</dc:title>
</dc:dc>
<mods:mods xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods:titleInfo>
<mods:nonSort>La</mods:nonSort>
<mods:title>difesa della razza</mods:title>
<mods:subTitle>scienza, documentazione, polemica</mods:subTitle>
<mods:partNumber>anno 1:n. 1</mods:partNumber>
</mods:titleInfo>
</mods:mods>
</wrap>
有什么想法可以强制我想要的行为吗?
谢谢
I have a few different XML documents that I'm trying to combine into one using lxml. The problem is that I need the result to preserve the namespaces on each of the sub-documents' root nodes. Lxml seems to want to push any namespace declarations used more than once to the root of the new document, which breaks in my application (it is an acknowledged bug).
So for example, I have document A:
<dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
<title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</title>
</dc>
and document B:
<mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<titleInfo>
<nonSort>La</nonSort>
<title>difesa della razza</title>
<subTitle>scienza, documentazione, polemica</subTitle>
<partNumber>anno 1:n. 1</partNumber>
</titleInfo>
</mods>
I want to wrap them in a element that also uses an xsi:schemaLocation, but I need the namespace declaration (xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance") to appear in all three nodes, like this:
<wrap xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org" xmlns:dc="http://www.foo.org" xmlns:mods="http://www.bar.org">
<dc:dc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
<dc:title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</dc:title>
</dc:dc>
<mods:mods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods:titleInfo>
<mods:nonSort>La</mods:nonSort>
<mods:title>difesa della razza</mods:title>
<mods:subTitle>scienza, documentazione, polemica</mods:subTitle>
<mods:partNumber>anno 1:n. 1</mods:partNumber>
</mods:titleInfo>
</mods:mods>
</wrap>
However, when I append these two documents using Python/lxml
wrap.append(dc)
wrap.append(mods)
I get the declaration pushed up to the highest level node that uses it. Unfortunately, this is a problem for my application. Like this:
<wrap xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org" xmlns:dc="http://www.foo.org" xmlns:mods="http://www.bar.org">
<dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/">
<dc:title>La difesa della razza: scienza, documentazione, polemica. anno 1:n. 1</dc:title>
</dc:dc>
<mods:mods xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
<mods:titleInfo>
<mods:nonSort>La</mods:nonSort>
<mods:title>difesa della razza</mods:title>
<mods:subTitle>scienza, documentazione, polemica</mods:subTitle>
<mods:partNumber>anno 1:n. 1</mods:partNumber>
</mods:titleInfo>
</mods:mods>
</wrap>
Any ideas how I can force the behavior I want?
THanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以尝试先插入
XInclude
元素,然后使用.xinclude()
方法解析它们(请参阅 文档)。这似乎保留了名称空间声明(lxml 在它们源自解析器时保留它们,但当您自己创建元素或将元素从一个文档移动到另一个文档时则不会保留它们)请注意,在您的情况下,您仍然需要更改标签名称元素的:它们将像原始文档中一样包含在内,没有任何命名空间,而您似乎已将它们更改为输出中的命名空间元素名称。
您可能必须使用自定义解析器,与文档看起来相反可以说
.xinclude()
不支持这一点(它确实使用解析器中的解析器来解析包含文档,它只是不支持将特定解析器或解析器传递给 XInclude 处理)。另一种选择可能是基于 xslt 的解决方案。
You could try inserting
XInclude
elements first, and then resolving them with the.xinclude()
method (see docs). That seems to preserve the namespace declarations (lxml keeps them when they originate from the parser, but not when you create elements yourself, or move elements from one document to another)Note that in your case, you would still need to change the tag name of the elements: they will be included as they are in the original documents, without any namespace, while you seem to have changed them to namespaced element names in your output.
You might have to use a custom resolver, contrary to what the docs might seem to say about
.xinclude()
not supporting this (it does use resolvers from the parser used to parse the containing document, it just doesn't support passing a specific resolver or parser to the XInclude processing).The other option would probably be an xslt-based solution.