将 Saxon C HE 与 Python 结合使用时,无 DTD 验证和 XInclude 解析

发布于 2025-01-11 07:26:23 字数 2220 浏览 6 评论 0 原文

我对 Python 的 Saxon C HE 版本有疑问。 成功安装后,我尝试了一些执行 XSLT 转换的示例。 这些都奏效了。

但是,当我解析 XML 文件时,解析期间不会执行 DTD 验证,并且不会解析 XInclude。 我已经尝试了很多方法,但是我无法解决这个问题。我希望有人可以向我展示并解释我的错误。

附件是一个示例,当 DTD 验证完成时,该示例应显示意图错误,因为 DTD 中不存在名为 FOU 的元素。 当我运行脚本时,它会创建一个 Result.xml 文件,并且存在错误的 FOU 元素和未解析的 XInclude。

我知道使用 lxml 很容易做到这一点,但是我想知道它如何与 Saxon 解析器一起工作。

XML 主控:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <FOU Id="A-1">
        <BAR Name="Test-Bar-1"/>
        <BAR Name="Test-Bar-2"/>
        <BAR Name="Test-Bar-3"/>
    </FOU>
    <TUTU Id="TU-1">
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Include.xml" xpointer="xpointer(/node()/node()/*)"/>
    </TUTU>
</TEST>

XML 包含:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <TUTU Id="TU-1">
        <TITI Name="Titi-1"/>
        <TITI Name="Titi-2"/>
        <TITI Name="Titi-3"/>
    </TUTU>
</TEST>

DTD:

<!ELEMENT TEST  (FOO+ , TUTU+)>
<!ELEMENT FOO   (BAR+)>
<!ELEMENT BAR   ANY>
<!ELEMENT TUTU  (TITI+)>
<!ELEMENT TITI  ANY>
<!-- Attribute -->
<!ATTLIST TEST
>
<!ATTLIST FOO
    Id      ID    #REQUIRED
>
<!ATTLIST BAR
    Name        CDATA #IMPLIED
>
<!ATTLIST TUTU
    Id      ID    #REQUIRED
>
<!ATTLIST TITI 
    Name        CDATA #IMPLIED
>

Python 脚本:

import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    xdmAtomicval = proc.make_boolean_value(False)
    xsltproc = proc.new_xslt_processor()
    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)
    
    xsltproc.set_source(xdm_node=document)
    xsltproc.set_output_file("Result.xml")
    xsltproc.compile_stylesheet(stylesheet_file="styl.xslt")
    xsltproc.transform_to_file(stylesheet_file="styl.xslt")
    
    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)

I have a question about the Saxon C HE version for Python.
After the successful installation I tried some examples where I executed XSLT transformations.
These all worked.

However, when I parse an XML file, no DTD validation is performed during parsing and the XIncludes are not resolved.
I have tried many things, however it is not possible for me to solve this problem. I hope someone can show me and explain my error.

Attached is an example which should show an error with intent when a DTD validation is done because there is no element with the name FOU in the DTD.
When I run the script then it creates a Result.xml file and both the erroneous FOU element is present and the XInclude which is not resolved.

I am aware that it is easy to do this with lxml, however I would like to know how it works with the Saxon parser.

XML Master:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <FOU Id="A-1">
        <BAR Name="Test-Bar-1"/>
        <BAR Name="Test-Bar-2"/>
        <BAR Name="Test-Bar-3"/>
    </FOU>
    <TUTU Id="TU-1">
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Include.xml" xpointer="xpointer(/node()/node()/*)"/>
    </TUTU>
</TEST>

XML Include:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <TUTU Id="TU-1">
        <TITI Name="Titi-1"/>
        <TITI Name="Titi-2"/>
        <TITI Name="Titi-3"/>
    </TUTU>
</TEST>

DTD:

<!ELEMENT TEST  (FOO+ , TUTU+)>
<!ELEMENT FOO   (BAR+)>
<!ELEMENT BAR   ANY>
<!ELEMENT TUTU  (TITI+)>
<!ELEMENT TITI  ANY>
<!-- Attribute -->
<!ATTLIST TEST
>
<!ATTLIST FOO
    Id      ID    #REQUIRED
>
<!ATTLIST BAR
    Name        CDATA #IMPLIED
>
<!ATTLIST TUTU
    Id      ID    #REQUIRED
>
<!ATTLIST TITI 
    Name        CDATA #IMPLIED
>

Python Script:

import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    xdmAtomicval = proc.make_boolean_value(False)
    xsltproc = proc.new_xslt_processor()
    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)
    
    xsltproc.set_source(xdm_node=document)
    xsltproc.set_output_file("Result.xml")
    xsltproc.compile_stylesheet(stylesheet_file="styl.xslt")
    xsltproc.transform_to_file(stylesheet_file="styl.xslt")
    
    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

权谋诡计 2025-01-18 07:26:23

您应该能够设置 xidtd 配置属性 为“on”。

proc.set_configuration_property("xi", "on")
proc.set_configuration_property("dtd", "on")

但是,我可以让它工作的唯一方法是从 xinclude 中删除 xpointer。我没有时间研究为什么这不起作用。

parse_xml() 似乎也没有进行任何验证或 xinclude 解析,但它确实发生在转换上(将 dtd 验证设置为“关闭”或“恢复”以获取 Result.xml)。

这是我用来测试的 Python 的修改版本......

import os
import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    proc.set_cwd(os.getcwd())
    proc.set_configuration_property("xi", "on")
    proc.set_configuration_property("dtd", "on")

    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)

    xsltproc = proc.new_xslt30_processor()
    xsltproc.transform_to_file(source_file="Master.xml", stylesheet_file="styl.xslt", output_file="Result.xml")

    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)

You should be able to set the xi and dtd configuration properties to "on".

proc.set_configuration_property("xi", "on")
proc.set_configuration_property("dtd", "on")

However, the only way I could get it to work was if I removed the xpointer from the xinclude. I didn't have time to research why this isn't working.

It also doesn't appear that parse_xml() does any validation or xinclude resolution, but it did happen on the transform (set dtd validation to "off" or to "recover" to get Result.xml).

Here's the modified version of your Python that I used to test...

import os
import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    proc.set_cwd(os.getcwd())
    proc.set_configuration_property("xi", "on")
    proc.set_configuration_property("dtd", "on")

    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)

    xsltproc = proc.new_xslt30_processor()
    xsltproc.transform_to_file(source_file="Master.xml", stylesheet_file="styl.xslt", output_file="Result.xml")

    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)
情绪失控 2025-01-18 07:26:23

SaxonC 11 中新增的 PyDocumentBuilder 类应该能够让您进行 DTD 验证。请参阅:https://www.saxonica.com/saxon- c/doc11/html/saxonc.html#PyDocumentBuilder
您应该能够使用 dtd_validation 方法来设置验证。

您可以按如下方式创建 PyDocumentBuilder:

proc.new_document_builder

The PyDocumentBuilder class which is new in SaxonC 11 should be able to enable you to do DTD validation. See: https://www.saxonica.com/saxon-c/doc11/html/saxonc.html#PyDocumentBuilder
You should be able to use the method dtd_validation to set validation.

You can create a PyDocumentBuilder as follows:

proc.new_document_builder
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文