让外籍人士在 python 中使用 .dtd 进行实体替换
我正在尝试读取一个如下所示的 xml 文件
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<incollection>
<author>José A. Blakeley</author>
</incollection>
</dblp>
造成问题的点是以下
José A. Blakeley
部分:解析器调用其字符处理程序两次,一次使用“Jos”,一次使用“A. Blakeley”。 现在我明白,如果它不知道紧急实体,这可能是正确的行为。但是,这是在我拥有的 dblp.dtd 中定义的。不过,我似乎无法说服外籍人士使用此文件。我只能说,
p = xml.parsers.expat.ParserCreate()
# tried with and without following line
p.SetParamEntityParsing(xml.parsers.expat.XML_PARAM_ENTITY_PARSING_ALWAYS)
p.UseForeignDTD(True)
f = open(dblp_file, "r")
p.ParseFile(f)
但外籍人士仍然不认识我的实体。为什么没有办法告诉外籍人士使用哪个 DTD?我尝试
- 将文件放入与 XML 相同的目录中,
- 将文件放入程序的工作目录中
- ,用绝对路径替换 xml 文件中的引用,
我缺少什么?谢谢。
I'm trying to read in an xml file which looks like this
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<incollection>
<author>José A. Blakeley</author>
</incollection>
</dblp>
The point that creates the problem looks is the
José A. Blakeley
part: The parser calls its character handler twice, once with "Jos", once with " A. Blakeley".
Now I understand this may be the correct behaviour if it doesn't know the eacute entity. However, this is defined in the dblp.dtd, which I have. I don't seem to be able to convince expat to use this file, though. All I can say is
p = xml.parsers.expat.ParserCreate()
# tried with and without following line
p.SetParamEntityParsing(xml.parsers.expat.XML_PARAM_ENTITY_PARSING_ALWAYS)
p.UseForeignDTD(True)
f = open(dblp_file, "r")
p.ParseFile(f)
but expat still doesn't recognize my entity. Why is there no way to tell expat which DTD to use? I've tried
- putting the file into the same directory as the XML
- putting the file into the program's working directory
- replacing the reference in the xml file by an absolute path
What am I missing? Thx.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
据我了解,如果您直接使用 pyexpat,那么您必须提供自己的
ExternalEntityRefHandler
来获取外部 DTD 并将其提供给 expat。参见例如。
xml.sax.expatreader
示例代码(方法external_entity_ref
,Python 2.6 中的第 374 行)。如果可以的话,最好使用更高级别的接口,例如 SAX(通过
expatreader
)。As I understand it, if you're using pyexpat directly, then you have to provide your own
ExternalEntityRefHandler
to fetch the external DTD and feed it to expat.See eg.
xml.sax.expatreader
for example code (methodexternal_entity_ref
, line 374 in Python 2.6).It would probably be better to use a higher-level interface such as SAX (via
expatreader
) if you can.顺便说一句,我可以通过将 .dtd 的相关部分复制到 XML 文件本身来暂时帮助自己,如下所示,
但这并不能真正以一般方式解决问题。
btw I can temporarily help myself by copying the relevant parts of the .dtd into the XML file itself, as in
but that doesn't really solve the problem in a general way.