在 Python 中使用 XML 模式进行验证

发布于 2024-07-09 05:56:48 字数 114 浏览 6 评论 0原文

我有一个 XML 文件和另一个文件中的 XML 架构,我想验证我的 XML 文件是否遵循该架构。 我如何在 Python 中做到这一点?

我更喜欢使用标准库,但如果需要,我可以安装第三方包。

I have an XML file and an XML schema in another file and I'd like to validate that my XML file adheres to the schema. How do I do this in Python?

I'd prefer something using the standard library, but I can install a third-party package if necessary.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

猥琐帝 2024-07-16 05:56:48

我假设您的意思是使用 XSD 文件。 令人惊讶的是,支持此功能的 Python XML 库并不多。 然而 lxml 确实如此。 检查使用 lxml 进行验证。 该页面还列出了如何使用 lxml 来验证其他架构类型。

I am assuming you mean using XSD files. Surprisingly there aren't many python XML libraries that support this. lxml does however. Check Validation with lxml. The page also lists how to use lxml to validate with other schema types.

野生奥特曼 2024-07-16 05:56:48

Python3 中使用流行库的简单验证器的示例 lxml

安装 lxml

pip install lxml

如果您得到出现类似“在 libxml2 库中找不到函数 xmlCheckVersion。安装了 libxml2 吗?”之类的错误,请首先尝试执行此操作:

# Debian/Ubuntu
apt-get install python-dev python3-dev libxml2-dev libxslt-dev

# Fedora 23+
dnf install python-devel python3-devel libxml2-devel libxslt-devel

最简单的验证器

让我们创建最简单的 validator.py

from lxml import etree

def validate(xml_path: str, xsd_path: str) -> bool:

    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)

    xml_doc = etree.parse(xml_path)
    result = xmlschema.validate(xml_doc)

    return result

然后编写并运行 ma​​in.py

from validator import validate

if validate("path/to/file.xml", "path/to/scheme.xsd"):
    print('Valid! :)')
else:
    print('Not valid! :(')

一点点 OOP

为了验证多个文件,无需创建 XMLSchema 每次都会对象,因此:

validator.py

from lxml import etree

class Validator:

    def __init__(self, xsd_path: str):
        xmlschema_doc = etree.parse(xsd_path)
        self.xmlschema = etree.XMLSchema(xmlschema_doc)

    def validate(self, xml_path: str) -> bool:
        xml_doc = etree.parse(xml_path)
        result = self.xmlschema.validate(xml_doc)

        return result

现在我们可以验证目录中的所有文件,如下所示:

ma​​in.py

import os
from validator import Validator

validator = Validator("path/to/scheme.xsd")

# The directory with XML files
XML_DIR = "path/to/directory"

for file_name in os.listdir(XML_DIR):
    print('{}: '.format(file_name), end='')

    file_path = '{}/{}'.format(XML_DIR, file_name)

    if validator.validate(file_path):
        print('Valid! :)')
    else:
        print('Not valid! :(')

有关更多选项,请阅读此处:使用 lxml 进行验证

An example of a simple validator in Python3 using the popular library lxml

Installation lxml

pip install lxml

If you get an error like "Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?", try to do this first:

# Debian/Ubuntu
apt-get install python-dev python3-dev libxml2-dev libxslt-dev

# Fedora 23+
dnf install python-devel python3-devel libxml2-devel libxslt-devel

The simplest validator

Let's create simplest validator.py

from lxml import etree

def validate(xml_path: str, xsd_path: str) -> bool:

    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)

    xml_doc = etree.parse(xml_path)
    result = xmlschema.validate(xml_doc)

    return result

then write and run main.py

from validator import validate

if validate("path/to/file.xml", "path/to/scheme.xsd"):
    print('Valid! :)')
else:
    print('Not valid! :(')

A little bit of OOP

In order to validate more than one file, there is no need to create an XMLSchema object every time, therefore:

validator.py

from lxml import etree

class Validator:

    def __init__(self, xsd_path: str):
        xmlschema_doc = etree.parse(xsd_path)
        self.xmlschema = etree.XMLSchema(xmlschema_doc)

    def validate(self, xml_path: str) -> bool:
        xml_doc = etree.parse(xml_path)
        result = self.xmlschema.validate(xml_doc)

        return result

Now we can validate all files in the directory as follows:

main.py

import os
from validator import Validator

validator = Validator("path/to/scheme.xsd")

# The directory with XML files
XML_DIR = "path/to/directory"

for file_name in os.listdir(XML_DIR):
    print('{}: '.format(file_name), end='')

    file_path = '{}/{}'.format(XML_DIR, file_name)

    if validator.validate(file_path):
        print('Valid! :)')
    else:
        print('Not valid! :(')

For more options read here: Validation with lxml

黑凤梨 2024-07-16 05:56:48

您可以使用 xmlschema Python 包轻松根据 XML 架构 (XSD) 验证 XML 文件或树。 它是纯 Python,可在 PyPi 上使用,并且没有太多依赖项。

示例 - 验证文件:

import xmlschema
xmlschema.validate('doc.xml', 'some.xsd')

如果文件未针对 XSD 进行验证,则该方法会引发异常。 该异常包含一些违规详细信息。

如果你想验证许多文件,你只需要加载 XSD 一次:

xsd = xmlschema.XMLSchema('some.xsd')
for filename in filenames:
    xsd.validate(filename)

如果你不需要异常,你可以像这样验证:

if xsd.is_valid('doc.xml'):
    print('do something useful')

或者,xmlschema 直接作用于文件对象和内存中的 XML 树(使用 xml.etree 创建) .ElementTree 或 lxml)。 例子:

import xml.etree.ElementTree as ET
t = ET.parse('doc.xml')
result = xsd.is_valid(t)
print('Document is valid? {}'.format(result))

You can easily validate an XML file or tree against an XML Schema (XSD) with the xmlschema Python package. It's pure Python, available on PyPi and doesn't have many dependencies.

Example - validate a file:

import xmlschema
xmlschema.validate('doc.xml', 'some.xsd')

The method raises an exception if the file doesn't validate against the XSD. That exception then contains some violation details.

If you want to validate many files you only have to load the XSD once:

xsd = xmlschema.XMLSchema('some.xsd')
for filename in filenames:
    xsd.validate(filename)

If you don't need the exception you can validate like this:

if xsd.is_valid('doc.xml'):
    print('do something useful')

Alternatively, xmlschema directly works on file objects and in memory XML trees (either created with xml.etree.ElementTree or lxml). Example:

import xml.etree.ElementTree as ET
t = ET.parse('doc.xml')
result = xsd.is_valid(t)
print('Document is valid? {}'.format(result))
听风念你 2024-07-16 05:56:48

至于“纯python”解决方案:包索引列出:

  • pyxsd,描述说它使用xml.etree.cElementTree,这不是“纯 python”(但包含在 stdlib 中),但源代码表明它回退到 xml.etree.ElementTree,因此这将被视为纯 python。 没有使用过它,但根据文档,它确实进行了模式验证。
  • minixsv:“用“纯”Python 编写的轻量级 XML 模式验证器”。 然而,描述说“当前支持 XML 模式标准的子集”,因此这可能还不够。
  • XSV,我认为它用于 W3C 的在线 xsd 验证器(它似乎仍然使用旧的 pyxml 包,我认为它是认为不再维护)

As for "pure python" solutions: the package index lists:

  • pyxsd, the description says it uses xml.etree.cElementTree, which is not "pure python" (but included in stdlib), but source code indicates that it falls back to xml.etree.ElementTree, so this would count as pure python. Haven't used it, but according to the docs, it does do schema validation.
  • minixsv: 'a lightweight XML schema validator written in "pure" Python'. However, the description says "currently a subset of the XML schema standard is supported", so this may not be enough.
  • XSV, which I think is used for the W3C's online xsd validator (it still seems to use the old pyxml package, which I think is no longer maintained)
痕至 2024-07-16 05:56:48

有两种方法(实际上还有更多)可以做到这一点。
1.使用lxml
pip install lxml

from lxml import etree, objectify
from lxml.etree import XMLSyntaxError

def xml_validator(some_xml_string, xsd_file='/path/to/my_schema_file.xsd'):
    try:
        schema = etree.XMLSchema(file=xsd_file)
        parser = objectify.makeparser(schema=schema)
        objectify.fromstring(some_xml_string, parser)
        print "YEAH!, my xml file has validated"
    except XMLSyntaxError:
        #handle exception here
        print "Oh NO!, my xml file does not validate"
        pass

xml_file = open('my_xml_file.xml', 'r')
xml_string = xml_file.read()
xml_file.close()

xml_validator(xml_string, '/path/to/my_schema_file.xsd')
  1. 从命令行使用 xmllint。 xmllint 已安装在许多 Linux 发行版中。

<代码>>> xmllint --format --pretty 1 --load-trace --debug --schema /path/to/my_schema_file.xsd /path/to/my_xml_file.xml

There are two ways(actually there are more) that you could do this.
1. using lxml
pip install lxml

from lxml import etree, objectify
from lxml.etree import XMLSyntaxError

def xml_validator(some_xml_string, xsd_file='/path/to/my_schema_file.xsd'):
    try:
        schema = etree.XMLSchema(file=xsd_file)
        parser = objectify.makeparser(schema=schema)
        objectify.fromstring(some_xml_string, parser)
        print "YEAH!, my xml file has validated"
    except XMLSyntaxError:
        #handle exception here
        print "Oh NO!, my xml file does not validate"
        pass

xml_file = open('my_xml_file.xml', 'r')
xml_string = xml_file.read()
xml_file.close()

xml_validator(xml_string, '/path/to/my_schema_file.xsd')
  1. Use xmllint from the commandline. xmllint comes installed in many linux distributions.

>> xmllint --format --pretty 1 --load-trace --debug --schema /path/to/my_schema_file.xsd /path/to/my_xml_file.xml

送君千里 2024-07-16 05:56:48

http://pyxb.sourceforge.net/ 处的 PyXB 包从 XML 架构文档生成 Python 的验证绑定。 它处理几乎所有模式构造并支持多个命名空间。

The PyXB package at http://pyxb.sourceforge.net/ generates validating bindings for Python from XML schema documents. It handles almost every schema construct and supports multiple namespaces.

花伊自在美 2024-07-16 05:56:48

lxml

通过 http://lxml.de/api 上的测试 提供 etree.DTD /lxml.tests.test_dtd-pysrc.html

...
root = etree.XML(_bytes("<b/>")) 
dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) 
self.assert_(dtd.validate(root)) 

lxml provides etree.DTD

from the tests on http://lxml.de/api/lxml.tests.test_dtd-pysrc.html

...
root = etree.XML(_bytes("<b/>")) 
dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) 
self.assert_(dtd.validate(root)) 
小伙你站住 2024-07-16 05:56:48
import xmlschema


def get_validation_errors(xml_file, xsd_file):
    schema = xmlschema.XMLSchema(xsd_file)
    validation_error_iterator = schema.iter_errors(xml_file)
    errors = list()
    for idx, validation_error in enumerate(validation_error_iterator, start=1):
        err = validation_error.__str__()
        errors.append(err)
        print(err)
    return errors

errors = get_validation_errors('sample3.xml', 'sample_schema.xsd')
import xmlschema


def get_validation_errors(xml_file, xsd_file):
    schema = xmlschema.XMLSchema(xsd_file)
    validation_error_iterator = schema.iter_errors(xml_file)
    errors = list()
    for idx, validation_error in enumerate(validation_error_iterator, start=1):
        err = validation_error.__str__()
        errors.append(err)
        print(err)
    return errors

errors = get_validation_errors('sample3.xml', 'sample_schema.xsd')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文