文档内架构声明和 lxml
根据 lxml 的官方文档,如果想要根据 xml 架构文档验证 xml 文档,则必须
- 构造 XMLSchema 对象(基本上是解析架构文档)
- 构造 XMLParser,将 XMLSchema 对象作为其
传递schema
参数 - 使用构造的解析器解析实际的 xml 文档(实例文档)
可能会有变化,但无论你如何做,本质都是几乎相同的, - 模式是“外部”指定的(而不是在实际的 xml 文档中指定它)。
如果您遵循此过程,那么肯定会进行验证,但如果我理解正确,则完全忽略 xsi
这引入了一大堆限制,首先是您必须自己处理实例<->模式关系(要么将其存储在外部)或者编写一些 hack 来从实例文档的根元素检索模式位置),您无法使用多个模式验证文档(例如,当每个模式管理其自己的名称空间时)等等。
所以问题是:也许我错过了一些完全微不足道的事情或者做错了?或者我关于 lxml 在模式验证方面的限制的陈述是否属实?
回顾一下,我希望能够:
- 让解析器在解析/验证时使用实例文档中的架构位置声明
- 使用多个架构来验证 xml 文档
- 在非根元素上声明架构位置(不是极其重要) )
也许我应该寻找不同的图书馆?尽管如此,这确实是一种耻辱,- lxml 是一个事实上的 python xml 处理库,并且被每个人认为是性能/功能/便利性方面最好的库(在某种程度上,这是理所当然的)
As per the official documentation of lxml, if one wants to validate a xml document against a xml schema document, one has to
- construct the XMLSchema object (basically, parse the schema document)
- construct the XMLParser, passing the XMLSchema object as its
schema
argument - parse the actual xml document (instance document) using the constructed parser
There can be variations, but the essense is pretty much the same no matter how you do it, - the schema is specified 'externally' (as opposed to specifying it inside the actual xml document).
If you follow this procedure, the validation occurs, sure enough, but if I understand it correctly, that completely ignores the whole idea of the schemaLocation and noNamespaceSchemaLocation attributes from xsi
This introduces a whole bunch of limitations, starting with the fact, that you have to deal with instance<->schema relation all by yourself (either store it externally or write some hack to retrieve the schema location from the root element of the instance document), you can not validate the document using multiple schemata (say, when each schema governs its own namespace) and so on.
So the question is: maybe I am missing something completely trivial or doing it wrong? Or are my statements about lxml's limitations regarding schema validation true?
To recap, I'd like to be able to:
- have the parser use the schema location declarations in the instance document at parse/validation time
- use multiple schemata to validate a xml document
- declare schema locations on non-root elements (not of extreme importance)
Maybe I should look for a different library? Although, that'd be a real shame, - lxml is a de-facto xml processing library for python and is regarded by everyone as the best one in terms of performace/features/convenience (and rightfully so, to a certain extent)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
注意:这不是完整的答案,因为我对 lxml 特别了解不多。
In 可以告诉您:
xsi:schemaLocation=" urn:schema1 schema1.xsd urn:schema2 schema2.xsd
。Caution: this is not the full answer to this, because I don't know all that much about lxml in particular.
In can just tell you that:
xsi:schemaLocation="urn:schema1 schema1.xsd urn:schema2 schema2.xsd
.