lxml unicode实体解析问题
我使用 lxml 来解析从另一个系统导出的 XML 文件:
xmldoc = open(filename)
etree.parse(xmldoc)
但我得到:
lxml.etree.XMLSyntaxError:实体 'eacute' 未定义,第 4495 行, 第46栏
显然,unicode 实体名称存在问题 - 但我该如何解决这个问题?通过 open() 还是 parse()?
编辑:我忘记将我的 DTD 包含在同一个文件夹中 - 它现在在那里并具有以下声明:
<!ENTITY eacute "é">
并且在 xmldoc 中被引用(并且始终如此):
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE DScribeDatabase SYSTEM "foo.dtd">
但我仍然得到相同的结果问题...DTD 也需要在 Python 中声明吗?
I'm using lxml as follows to parse an exported XML file from another system:
xmldoc = open(filename)
etree.parse(xmldoc)
But im getting:
lxml.etree.XMLSyntaxError: Entity
'eacute' not defined, line 4495,
column 46
Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse()?
Edit: I had forgotten to include my DTD in the same folder - it's there now and has the following declaration:
<!ENTITY eacute "é">
and is referred to (and always was) in xmldoc as so:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE DScribeDatabase SYSTEM "foo.dtd">
Yet I still get the same problem ... does the DTD need to be declared in Python too?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
eacute
不是 XML 中的预定义实体。要在 XML 文件中包含é
实体引用,它必须具有指向 DTD(例如 XHTML 1.0 DTD)的声明定义实体。
如果 XML 使用
é
但没有,则它的格式不正确,导出它的系统需要修复。
(没有充分的理由在 XML 文件中使用实体引用来表示
é
。字符引用é
在没有实体定义的情况下在任何地方都可以理解,如果由于某种原因文件不能简单地包含原始 UTF-8é
。)eacute
is not a predefined entity in XML. To include ané
entity reference in an XML file, it must have a<!DOCTYPE>
declaration pointing to a DTD (such as an XHTML 1.0 DTD) that defines the entity.If the XML uses
é
but doesn't have a<!DOCTYPE>
, it is not well-formed and the system that exported it needs to be fixed.(There isn't a good reason to use an entity reference to represent
é
in an XML file. The character referenceé
is understood everywhere without entity definitions, if the file can't simply include a raw UTF-8é
for some reason.)