如何使用 Python 中的 Amara 库根据 XSD 架构验证 xml 文件？

发布于 2024-09-12 01:39:14 字数 1584 浏览 7 评论 0原文

以下问题的高额赏金：

您好，这是我在 Ubuntu 9.10 上使用 Python 2.6、Amara2 进行的尝试（顺便说一句，test.xsd 是使用 xml2xsd 工具创建的）：

g@spot:~$ cat test.xml; echo =====o=====; cat test.xsd; echo ==== 
o=====; cat test.py; echo =====o=====; ./test.py; echo =====o===== 
<?xml version="1.0" encoding="utf-8"?>==; ./test.py` > 
test.txttest.xsd; echo === 
<test>abcde</test> 
=====o===== 
<?xml version="1.0" encoding="UTF-8"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
elementFormDefault="qualified"> 
  <xs:element name="test" type="xs:NCName"/> 
</xs:schema> 
=====o===== 
#!/usr/bin/python2.6 
# I wish to validate an xml file against an external XSD schema. 
from amara import bindery, parse 
source = 'test.xml' 
schema = 'test.xsd' 
#help(bindery.parse) 
#doc = bindery.parse(source, uri=schema, validate=True) # These 2 seem 
to fail in the same way. 
doc = parse(source, uri=schema, validate=True) # So, what is the 
difference anyway? 
# 
=====o===== 
Traceback (most recent call last): 
  File "./test.py", line 14, in <module> 
    doc = parse(source, uri=schema, validate=True) 
  File "/usr/local/lib/python2.6/dist-packages/Amara-2.0a4-py2.6-linux- 
x86_64.egg/amara/tree.py", line 50, in parse 
    return _parse(inputsource(obj, uri), flags, 
entity_factory=entity_factory) 
amara.ReaderError: In file:///home/g/test.xml, line 2, column 0: 
Missing document type declaration 
g@spot:~$ 
=====o=====

那么，为什么我会看到此错误？不支持这个功能吗？如何在拥有以下内容的同时根据 XSD 验证 XML 文件是否可以灵活地指向任何 XSD 文件？谢谢，如果您有疑问，请告诉我。

原文

High bounty for the following Q:

Hello,
Here is what I tried on Ubuntu 9.10 using Python 2.6, Amara2
(by the way, test.xsd was created using xml2xsd tool):

g@spot:~$ cat test.xml; echo =====o=====; cat test.xsd; echo ==== 
o=====; cat test.py; echo =====o=====; ./test.py; echo =====o===== 
<?xml version="1.0" encoding="utf-8"?>==; ./test.py` > 
test.txttest.xsd; echo === 
<test>abcde</test> 
=====o===== 
<?xml version="1.0" encoding="UTF-8"?> 
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
elementFormDefault="qualified"> 
  <xs:element name="test" type="xs:NCName"/> 
</xs:schema> 
=====o===== 
#!/usr/bin/python2.6 
# I wish to validate an xml file against an external XSD schema. 
from amara import bindery, parse 
source = 'test.xml' 
schema = 'test.xsd' 
#help(bindery.parse) 
#doc = bindery.parse(source, uri=schema, validate=True) # These 2 seem 
to fail in the same way. 
doc = parse(source, uri=schema, validate=True) # So, what is the 
difference anyway? 
# 
=====o===== 
Traceback (most recent call last): 
  File "./test.py", line 14, in <module> 
    doc = parse(source, uri=schema, validate=True) 
  File "/usr/local/lib/python2.6/dist-packages/Amara-2.0a4-py2.6-linux- 
x86_64.egg/amara/tree.py", line 50, in parse 
    return _parse(inputsource(obj, uri), flags, 
entity_factory=entity_factory) 
amara.ReaderError: In file:///home/g/test.xml, line 2, column 0: 
Missing document type declaration 
g@spot:~$ 
=====o=====

So, why am I seeing this error? Is this functionality not supported?
How can I validate an XML file against an XSD while having the
flexibility to point to any XSD file?
Thanks, and let me know if you have questions.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

揪着可爱 2024-09-19 01:39:14

如果您愿意使用除 amara 之外的其他库，请尝试 lxml。它支持您想要轻松完成的事情：

from lxml import etree

source_file = 'test.xml'
schema_file = 'test.xsd'

with open(schema_file) as f_schema:

    schema_doc = etree.parse(f_schema)
    schema = etree.XMLSchema(schema_doc)
    parser = etree.XMLParser(schema = schema)

    with open(source_file) as f_source:
        try:
            doc = etree.parse(f_source, parser)
        except etree.XMLSyntaxError as e:
            # this exception is thrown on schema validation error
            print e

If you're open to using another library besides amara, try lxml. It supports what you're trying to do pretty easily:

from lxml import etree

source_file = 'test.xml'
schema_file = 'test.xsd'

with open(schema_file) as f_schema:

    schema_doc = etree.parse(f_schema)
    schema = etree.XMLSchema(schema_doc)
    parser = etree.XMLParser(schema = schema)

    with open(source_file) as f_source:
        try:
            doc = etree.parse(f_source, parser)
        except etree.XMLSyntaxError as e:
            # this exception is thrown on schema validation error
            print e

回复收藏 0 原文

爱已欠费 2024-09-19 01:39:14

我建议您使用 noNamespaceSchemaLocation 属性将 XML 文件绑定到 XSD 架构。然后，您的 XML 文件 test.xml 将是

<?xml version="1.0" encoding="utf-8"?>
<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="test.xsd">abcde</test>

文件 test.xsd

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified">
    <xs:element name="test" type="xs:NCName"/>
</xs:schema>

应该放置在与 test.xsd 相同的目录中的位置。从 XML 文件引用 XML 模式是通用技术，它应该在 Python 中工作。

优点是您不需要知道每个 XML 文件的架构文件。它将在 XML 文件的解析 (etree.parse) 过程中自动找到。

I'll recommend you to use noNamespaceSchemaLocation attribute to bind the XML file to the XSD schema. Then your XML file test.xml will be

<?xml version="1.0" encoding="utf-8"?>
<test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="test.xsd">abcde</test>

where the file test.xsd

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified">
    <xs:element name="test" type="xs:NCName"/>
</xs:schema>

should be placed in the same directory as the test.xsd. It is general technique to reference the XML schema from the XML file and it should work in Python.

The advantage is that you don't need to know the schema file for every XML file. It will be automatically found during parsing (etree.parse) of the XML file.

回复收藏 0 原文

~没有更多了~