让外籍人士在 python 中使用 .dtd 进行实体替换

发布于 2024-09-02 23:10:19 字数 878 浏览 11 评论 0原文

我正在尝试读取一个如下所示的 xml 文件

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<incollection>
<author>Jos&eacute; A. Blakeley</author>
</incollection>
</dblp>

造成问题的点是以下

Jos&eacute; A. Blakeley

部分:解析器调用其字符处理程序两次,一次使用“Jos”,一次使用“A. Blakeley”。 现在我明白,如果它不知道紧急实体,这可能是正确的行为。但是,这是在我拥有的 dblp.dtd 中定义的。不过,我似乎无法说服外籍人士使用此文件。我只能说,

p = xml.parsers.expat.ParserCreate()
# tried with and without following line
p.SetParamEntityParsing(xml.parsers.expat.XML_PARAM_ENTITY_PARSING_ALWAYS) 
p.UseForeignDTD(True)
f = open(dblp_file, "r")
p.ParseFile(f)

但外籍人士仍然不认识我的实体。为什么没有办法告诉外籍人士使用哪个 DTD?我尝试

  • 将文件放入与 XML 相同的目录中,
  • 将文件放入程序的工作目录中
  • ,用绝对路径替换 xml 文件中的引用,

我缺少什么?谢谢。

I'm trying to read in an xml file which looks like this

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<incollection>
<author>José A. Blakeley</author>
</incollection>
</dblp>

The point that creates the problem looks is the

José A. Blakeley

part: The parser calls its character handler twice, once with "Jos", once with " A. Blakeley".
Now I understand this may be the correct behaviour if it doesn't know the eacute entity. However, this is defined in the dblp.dtd, which I have. I don't seem to be able to convince expat to use this file, though. All I can say is

p = xml.parsers.expat.ParserCreate()
# tried with and without following line
p.SetParamEntityParsing(xml.parsers.expat.XML_PARAM_ENTITY_PARSING_ALWAYS) 
p.UseForeignDTD(True)
f = open(dblp_file, "r")
p.ParseFile(f)

but expat still doesn't recognize my entity. Why is there no way to tell expat which DTD to use? I've tried

  • putting the file into the same directory as the XML
  • putting the file into the program's working directory
  • replacing the reference in the xml file by an absolute path

What am I missing? Thx.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

关于从前 2024-09-09 23:10:19

据我了解,如果您直接使用 pyexpat,那么您必须提供自己的 ExternalEntityRefHandler 来获取外部 DTD 并将其提供给 expat。

参见例如。 xml.sax.expatreader 示例代码(方法 external_entity_ref,Python 2.6 中的第 374 行)。

如果可以的话,最好使用更高级别的接口,例如 SAX(通过 expatreader)。

As I understand it, if you're using pyexpat directly, then you have to provide your own ExternalEntityRefHandler to fetch the external DTD and feed it to expat.

See eg. xml.sax.expatreader for example code (method external_entity_ref, line 374 in Python 2.6).

It would probably be better to use a higher-level interface such as SAX (via expatreader) if you can.

绅刃 2024-09-09 23:10:19

顺便说一句,我可以通过将 .dtd 的相关部分复制到 XML 文件本身来暂时帮助自己,如下所示,

<!DOCTYPE dblp [
    <!ENTITY Agrave  "À" >
]>

但这并不能真正以一般方式解决问题。

btw I can temporarily help myself by copying the relevant parts of the .dtd into the XML file itself, as in

<!DOCTYPE dblp [
    <!ENTITY Agrave  "À" >
]>

but that doesn't really solve the problem in a general way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文