如何在 Python 中使用紧凑的 RELAX NG 模式验证 XML 文档?

发布于 2024-08-01 14:15:59 字数 46 浏览 10 评论 0原文

如何在 Python 中通过紧凑的 RELAX NG 模式验证 XML 文档?

How do I validate XML document via compact RELAX NG schema in Python?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

女皇必胜 2024-08-08 14:15:59

如果您想从命令行检查语法与 Compact RelaxNG 语法,您可以使用 pyjing,来自 jingtrang 模块。

它支持 .rnc 文件,并显示更多详细信息,而不仅仅是 TrueFalse。 例如:

C:\>pyjing -c root.rnc invalid.xml
C:\invalid.xml:9:9: error: element "name" not allowed here; expected the element end-tag or element "bounds"

注意:它是 Java jingtrang 的 Python 包装器,因此需要安装 Java。

如果您想在 Python 中检查语法,您可以

  1. < p>使用pytrang(来自jingtrang包装器)将“Compact RelaxNG”(.rnc)转换为XML RelaxNG(.rng< /强>):
    pytrang root.rnc root.rng

  2. 使用lxml解析转换后的.rng文件,如下所示:https://lxml.de/validation.html#relaxng

那会是这样的:

>>> from lxml import etree
>>> from subprocess import call

>>> call("pytrang root.rnc root.rng")

>>> with open("root.rng") as f:
...    relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)

>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True

>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False

If you want to check syntax vs Compact RelaxNG Syntax from command line, you can use pyjing, from the jingtrang module.

It supports .rnc files and displays more details than just True or False. For example:

C:\>pyjing -c root.rnc invalid.xml
C:\invalid.xml:9:9: error: element "name" not allowed here; expected the element end-tag or element "bounds"

NOTE: it is a Python wrapper of the Java jingtrang so it requires to have Java installed.

If you want to check the syntax from within Python, you can

  1. Use pytrang (from jingtrang wrapper) to convert "Compact RelaxNG" (.rnc) to XML RelaxNG (.rng):
    pytrang root.rnc root.rng

  2. Use lxml to parse converted .rng file like this: https://lxml.de/validation.html#relaxng

That would be something like that:

>>> from lxml import etree
>>> from subprocess import call

>>> call("pytrang root.rnc root.rng")

>>> with open("root.rng") as f:
...    relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)

>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True

>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False
北恋 2024-08-08 14:15:59

使用 lxml 怎么样?

来自文档:

>>> f = StringIO('''\
... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0">
...  <zeroOrMore>
...     <element name="b">
...       <text />
...     </element>
...  </zeroOrMore>
... </element>
... ''')
>>> relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)

>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True

>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False

How about using lxml?

From the docs:

>>> f = StringIO('''\
... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0">
...  <zeroOrMore>
...     <element name="b">
...       <text />
...     </element>
...  </zeroOrMore>
... </element>
... ''')
>>> relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)

>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True

>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False
时光与爱终年不遇 2024-08-08 14:15:59

使用 jingtrang 或其他外部工具的替代方法是使用名为 rnc2rng 的 Python 库。 这可以加载RNC格式的文件,将其转换为ReleaxNG XML格式作为字符串,可以加载到lxml的RelaxNG类中。

有关详细信息,请参阅此问题

An alternative to using jingtrang or other external tools is to use the Python library called rnc2rng. This can load the file in RNC format, convert it to ReleaxNG XML format as a string, which can be loaded into lxml's RelaxNG class.

See this issue for details.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文