来自 StringIO 源的 Python xml etree DTD?

发布于 2024-09-25 04:10:24 字数 1065 浏览 6 评论 0原文

我正在调整以下代码(通过这个问题中的建议创建),这需要XML 文件及其 DTD 并将它们转换为不同的格式。对于这个问题,只有加载部分很重要:

xmldoc = open(filename)

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)    
tree = etree.parse(xmldoc, parser)

在使用文件系统时,这工作得很好,但我将其转换为通过 Web 框架运行,其中两个文件通过表单加载。

加载 xml 文件工作正常:

tree = etree.parse(StringIO(data['xml_file']) 

但是由于 DTD 链接到 xml 文件的顶部,因此以下语句失败:

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
tree = etree.parse(StringIO(data['xml_file'], parser)

通过 这个问题,我尝试过:

etree.DTD(StringIO(data['dtd_file'])
tree = etree.parse(StringIO(data['xml_file'])

虽然第一行不会导致错误,但第二行落在unicode实体上,DTD意味着拾取(并在文件系统版本中这样做):

XMLSyntaxError:实体“eacute”不是 已定义,第 4495 行,第 46 列

我如何正确加载此 DTD?

I'm adapting the following code (created via advice in this question), that took an XML file and it's DTD and converted them to a different format. For this problem only the loading section is important:

xmldoc = open(filename)

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)    
tree = etree.parse(xmldoc, parser)

This worked fine, whilst using the file system, but I'm converting it to run via a web framework, where the two files are loaded via a form.

Loading the xml file works fine:

tree = etree.parse(StringIO(data['xml_file']) 

But as the DTD is linked to in the top of the xml file, the following statement fails:

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
tree = etree.parse(StringIO(data['xml_file'], parser)

Via this question, I tried:

etree.DTD(StringIO(data['dtd_file'])
tree = etree.parse(StringIO(data['xml_file'])

Whilst the first line doesn't cause an error, the second falls over on unicode entities the DTD is meant to pick up (and does so in the file system version):

XMLSyntaxError: Entity 'eacute' not
defined, line 4495, column 46

How do I go about correctly loading this DTD?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冰雪之触 2024-10-02 04:10:24

这是一个简短但完整的示例,使用 @Steven 提到的自定义解析器技术。

from StringIO import StringIO
from lxml import etree

data = dict(
    xml_file = '''<?xml version="1.0"?>
<!DOCTYPE x SYSTEM "a.dtd">
<x><y>ézz</y></x>
''',
    dtd_file = '''<!ENTITY eacute "é">
<!ELEMENT x (y)>
<!ELEMENT y (#PCDATA)>
''')

class DTDResolver(etree.Resolver):
     def resolve(self, url, id, context):
         return self.resolve_string(data['dtd_file'], context)

xmldoc = StringIO(data['xml_file'])
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
parser.resolvers.add(DTDResolver())
try:
    tree = etree.parse(xmldoc, parser)
except etree.XMLSyntaxError as e:
    # handle xml and validation errors

Here's a short but complete example, using the custom resolver technique @Steven mentioned.

from StringIO import StringIO
from lxml import etree

data = dict(
    xml_file = '''<?xml version="1.0"?>
<!DOCTYPE x SYSTEM "a.dtd">
<x><y>ézz</y></x>
''',
    dtd_file = '''<!ENTITY eacute "é">
<!ELEMENT x (y)>
<!ELEMENT y (#PCDATA)>
''')

class DTDResolver(etree.Resolver):
     def resolve(self, url, id, context):
         return self.resolve_string(data['dtd_file'], context)

xmldoc = StringIO(data['xml_file'])
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
parser.resolvers.add(DTDResolver())
try:
    tree = etree.parse(xmldoc, parser)
except etree.XMLSyntaxError as e:
    # handle xml and validation errors
安人多梦 2024-10-02 04:10:24

您可能可以使用自定义解析器。文档实际上给出了一个这样做的示例来提供 dtd。

You could probably use a custom resolver. The docs actually give an example of doing this to provide a dtd.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文