python：检查 XSD xml 架构

发布于 2024-08-30 09:18:16 字数 1880 浏览 3 评论 0原文

我想检查 python 中的 XSD 模式。目前，我正在使用 lxml，当它只需要根据模式验证文档时，它就可以很好地完成它的工作。但是，我想知道架构内部有什么并访问 lxml 行为中的元素。

架构：

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:include schemaLocation="worker_remote_base.xsd"/>
    <xsd:include schemaLocation="transactions_worker_responses.xsd"/>
    <xsd:include schemaLocation="transactions_worker_requests.xsd"/>
</xsd:schema>

用于加载架构的 lxml 代码是（简化的）：

xsd_file_handle = open( self._xsd_file, 'rb')
xsd_text        = xsd_file_handle.read()
schema_document   = etree.fromstring(xsd_text, base_url=xmlpath)
xmlschema         = etree.XMLSchema(schema_document)

然后我可以使用 schema_document （即 etree._Element）来浏览架构，如下所示XML 文档。但由于 etree.fromstring （至少看起来是这样）需要 XML 文档，因此 xsd:include 元素不会被处理。

目前，问题是通过解析第一个架构文档，然后加载包含元素，然后手动将它们一个一个插入到主文档中来解决的：

BASE_URL            = "/xml/"
schema_document     = etree.fromstring(xsd_text, base_url=BASE_URL)
tree                = schema_document.getroottree()

schemas             = []
for schemaChild in schema_document.iterchildren():
    if schemaChild.tag.endswith("include"):
        try:
            h = open (os.path.join(BASE_URL, schemaChild.get("schemaLocation")), "r")
            s = etree.fromstring(h.read(), base_url=BASE_URL)
            schemas.append(s)
        except Exception as ex:
            print "failed to load schema: %s" % ex
        finally:
            h.close()
        # remove the <xsd:include ...> element
        self._schema_document.remove(schemaChild)

for s in schemas:
# inside <schema>
    for sChild in s:
        schema_document.append(sChild)

我要求的是如何通过使用更常见的方法来解决问题的想法方式。我已经在 python 中搜索了其他模式解析器，但目前没有任何适合这种情况的解析器。

问候，

原文

I would like to examine a XSD schema in python. Currently I'm using lxml which is doing it's job very very well when it only has to validate a document against the schema. But, I want to know what's inside of the schema and access the elements in the lxml behavior.

The schema:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:include schemaLocation="worker_remote_base.xsd"/>
    <xsd:include schemaLocation="transactions_worker_responses.xsd"/>
    <xsd:include schemaLocation="transactions_worker_requests.xsd"/>
</xsd:schema>

The lxml code to load the schema is (simplyfied):

xsd_file_handle = open( self._xsd_file, 'rb')
xsd_text        = xsd_file_handle.read()
schema_document   = etree.fromstring(xsd_text, base_url=xmlpath)
xmlschema         = etree.XMLSchema(schema_document)

I'm then able to use schema_document (which is etree._Element) to go through the schema as an XML document. But since etree.fromstring (at least it seems like that) expects a XML document the xsd:include elements are not processed.

The problem is currently solved by parsing the first schema document, then load the include elements and then insert them one by one into the main document by hand:

BASE_URL            = "/xml/"
schema_document     = etree.fromstring(xsd_text, base_url=BASE_URL)
tree                = schema_document.getroottree()

schemas             = []
for schemaChild in schema_document.iterchildren():
    if schemaChild.tag.endswith("include"):
        try:
            h = open (os.path.join(BASE_URL, schemaChild.get("schemaLocation")), "r")
            s = etree.fromstring(h.read(), base_url=BASE_URL)
            schemas.append(s)
        except Exception as ex:
            print "failed to load schema: %s" % ex
        finally:
            h.close()
        # remove the <xsd:include ...> element
        self._schema_document.remove(schemaChild)

for s in schemas:
# inside <schema>
    for sChild in s:
        schema_document.append(sChild)

What I'm asking for is an idea how to solve the problem by using a more common way. I've already searched for other schema parsers in python but for now there was nothing that would fit in that case.

Greetings,

分享到QQ

分享到微博