文档内架构声明和 lxml

发布于 2024-09-03 22:55:55 字数 871 浏览 5 评论 0原文

根据 lxml 的官方文档,如果想要根据 xml 架构文档验证 xml 文档,则必须

  1. 构造 XMLSchema 对象(基本上是解析架构文档)
  2. 构造 XMLParser,将 XMLSchema 对象作为其 传递schema 参数
  3. 使用构造的解析器解析实际的 xml 文档(实例文档)

可能会有变化,但无论你如何做,本质都是几乎相同的, - 模式是“外部”指定的(而不是在实际的 xml 文档中指定它)。

如果您遵循此过程,那么肯定会进行验证,但如果我理解正确,则完全忽略 xsi

这引入了一大堆限制,首先是您必须自己处理实例<->模式关系(要么将其存储在外部)或者编写一些 hack 来从实例文档的根元素检索模式位置),您无法使用多个模式验证文档(例如,当每个模式管理其自己的名称空间时)等等。

所以问题是:也许我错过了一些完全微不足道的事情或者做错了?或者我关于 lxml 在模式验证方面的限制的陈述是否属实?

回顾一下,我希望能够:

  • 让解析器在解析/验证时使用实例文档中的架构位置声明
  • 使用多个架构来验证 xml 文档
  • 在非根元素上声明架构位置(不是极其重要) )

也许我应该寻找不同的图书馆?尽管如此,这确实是一种耻辱,- lxml 是一个事实上的 python xml 处理库,并且被每个人认为是性能/功能/便利性方面最好的库(在某种程度上,这是理所当然的)

As per the official documentation of lxml, if one wants to validate a xml document against a xml schema document, one has to

  1. construct the XMLSchema object (basically, parse the schema document)
  2. construct the XMLParser, passing the XMLSchema object as its schema argument
  3. parse the actual xml document (instance document) using the constructed parser

There can be variations, but the essense is pretty much the same no matter how you do it, - the schema is specified 'externally' (as opposed to specifying it inside the actual xml document).

If you follow this procedure, the validation occurs, sure enough, but if I understand it correctly, that completely ignores the whole idea of the schemaLocation and noNamespaceSchemaLocation attributes from xsi

This introduces a whole bunch of limitations, starting with the fact, that you have to deal with instance<->schema relation all by yourself (either store it externally or write some hack to retrieve the schema location from the root element of the instance document), you can not validate the document using multiple schemata (say, when each schema governs its own namespace) and so on.

So the question is: maybe I am missing something completely trivial or doing it wrong? Or are my statements about lxml's limitations regarding schema validation true?

To recap, I'd like to be able to:

  • have the parser use the schema location declarations in the instance document at parse/validation time
  • use multiple schemata to validate a xml document
  • declare schema locations on non-root elements (not of extreme importance)

Maybe I should look for a different library? Although, that'd be a real shame, - lxml is a de-facto xml processing library for python and is regarded by everyone as the best one in terms of performace/features/convenience (and rightfully so, to a certain extent)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

甜妞爱困 2024-09-10 22:55:55

注意:这不是完整的答案,因为我对 lxml 特别了解不多。

In 可以告诉您:

  • 忽略文档中的架构位置并管理命名空间 ->应用程序中的架构文件映射几乎总是更好,除非您可以保证架构将位于与文件相比非常特定的位置。如果您想将其移出代码,请使用目录或提供配置文件。
  • 如果您确实想要使用 schemaLocation,并且想要验证多个架构,只需将它们全部包含在一个 schemaLocation 属性中,并以空格分隔,并位于命名空间 URI/位置对中:xsi:schemaLocation=" urn:schema1 schema1.xsd urn:schema2 schema2.xsd
  • 最后,我认为任何处理器都不会找到在非根元素上声明的 schemaLocation 属性,这并不重要:只需将它们全部放在根元素上即可。

Caution: this is not the full answer to this, because I don't know all that much about lxml in particular.

In can just tell you that:

  • Ignoring schemalocations in documents and instead managing a namespace -> schema file mapping in an application is almost always better, unless you can guarantee that the schema will be in a very specific location compared to the file. If you want to move it out of code, use a catalogue or come up with a configuration file.
  • If you do want to use schemaLocation, and want to validate multiple schemas, just include them all in one schemaLocation attribute, separated by spaces, in namespace URI/location pairs: xsi:schemaLocation="urn:schema1 schema1.xsd urn:schema2 schema2.xsd.
  • Finally, I don't think any processor will find schemaLocation attributes declared on non-root elements. Not that it matters: just put them all on the root.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文