org.w3c 的 Javadoc .dom.Entity
状态:
XML 不强制非验证 XML 处理器读取和处理在外部子集中进行的实体声明或在参数实体中声明的实体声明。这意味着在外部子集中声明的已解析实体不需要由某些类别的应用程序扩展,并且实体的替换文本可能不可用。当替换文本可用时,相应的Entity节点的子列表表示该替换值的结构。否则,子列表为空。
虽然它不引用内部子集中所做的实体声明,但肯定有一些解析器配置可以读取和处理任一子集中的实体声明?事实上,我对文档的阅读表明这是默认的。
无论如何,我已经针对已在内部子集(如图所示)和外部子集中声明的实体测试了以下方法(使用 Xerces),但 foo.hasChildNodes() 返回 false (并且 foo.getChildNodes()
返回 foo
!) 在每种情况下:
// some trivial example XML
String xml = "<!DOCTYPE example [ <!ENTITY foo 'bar'> ]>\n<example/>";
InputStream is = new ByteArrayInputStream(xml.getBytes());
// parse
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
DocumentType docType = builder.parse(is).getDoctype();
// retrieve the entity - works fine
Entity foo = (Entity) docType.getEntities().getNamedItem("foo");
// now how to get the entity's replacement text?
毫无疑问,我错过了一些相当明显的东西;感谢你的想法。
编辑
从到目前为止的答案来看,我的 Xerces 实现行为不正常。我将尝试将所有 Xerces 库更新到最新版本,如果这解决了我的问题,我将结束该问题。非常感谢。
更新
更新 Xerces 确实解决了问题,前提是该实体是从文档内部引用的;如果不是,则该节点仍然没有子节点。我并不完全清楚为什么会出现这种情况。如果有人可以解释发生了什么和/或向我指出如何强制创建子节点而不显式引用文档中的每个实体,我将不胜感激。
The Javadoc for org.w3c.dom.Entity
states:
XML does not mandate that a non-validating XML processor read and process entity declarations made in the external subset or declared in parameter entities. This means that parsed entities declared in the external subset need not be expanded by some classes of applications, and that the replacement text of the entity may not be available. When the replacement text is available, the corresponding Entity node's child list represents the structure of that replacement value. Otherwise, the child list is empty.
Whilst it does not refer to entity declarations made in the internal subset, there must surely be some configuration of parser which will read and process entity declarations in either subset? Indeed, my reading of the documentation would suggest that this is the default.
In any event, I have tested the following approach (using Xerces) against entities which have been declared in the internal subset (as shown) and also in an external subset, but foo.hasChildNodes()
returns false (and foo.getChildNodes()
returns foo
!) in every case:
// some trivial example XML
String xml = "<!DOCTYPE example [ <!ENTITY foo 'bar'> ]>\n<example/>";
InputStream is = new ByteArrayInputStream(xml.getBytes());
// parse
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
DocumentType docType = builder.parse(is).getDoctype();
// retrieve the entity - works fine
Entity foo = (Entity) docType.getEntities().getNamedItem("foo");
// now how to get the entity's replacement text?
No doubt I am missing something rather obvious; grateful for your thoughts.
EDIT
It appears from the answers so far that my Xerces implementation is misbehaving. I will try to update all Xerces libraries to latest versions and, if that solves my problem, I will close off the question. Many thanks.
UPDATE
Updating Xerces has indeed solved the problem, provided that the entity is referenced from within the document; if it is not, then the node still has no children. It is not entirely clear to me why this should be the case. Grateful if someone could explain what's going on and/or point me to how I can force the creation of the child nodes without explicitly referencing every entity from within the document.
发布评论
评论(4)
我认为您可能误解了替换文本的工作原理。基于一些阅读(http://www.javacommerce.com/ displaypage.jsp?name=entities.sql&id=18238),在我看来,替换文本就像变量一样。因此,在上面的示例中,您从未引用
&foo;
实体。如果运行下面的代码示例,您将看到发生的情况是&foo;
被字符串bar
替换:您看到打印的是
[# text: bar]
这是 XML 中的文本替换。I think you may be mistaken how the replacement text works. Based on some reading (http://www.javacommerce.com/displaypage.jsp?name=entities.sql&id=18238), it looks to me like the replacement text works like a variable. So, in your example above you are never referencing the
&foo;
entity. If you run the code sample below you will see that what happens is the&foo;
gets replaced with the stringbar
:What you see printed is
[#text: bar]
which is the text replacement within the XML.我可能是错的,但我认为实体节点将替换文本存储为文本值,而不是节点集;这是因为在解析实体定义时,实体实际上并未完全解析:这主要是因为 DTD 处理程序是在实际解析过程之前发生的预处理器。
因此,检查实体节点的文本值而不是子节点列表。
I may be wrong, but I think Entity nodes store replacement text as text value, and not as set of nodes; this because entities are not actually fully parsed when parsing entity definitions: this mostly since DTD handler is sort of pre-processor that occurs before actual parsing process.
So check out text value of entity node instead of children node list.
我不知道为什么 foo.getChildNodes() 不起作用,但我发现了以下内容。如果在文档中使用(引用)实体,
]>\n&foo; ,
获得替换文本
则可以通过
foo.getTextContent()
I don't know why
foo.getChildNodes()
doesn't work, but I discovered the following. If the entity is used (referenced) in the document,<!DOCTYPE example [<!ENTITY foo 'bar'>]>\n<example>&foo;</example>
,then the replacement text is available via
foo.getTextContent()
我在 Xerces-J 用户邮件列表上询问了不存在文档中未引用实体的子节点; Michael Glavassevich 帮助我走向Andy Clark 的一篇旧帖子解释如下:
I asked on the Xerces-J Users mailing list about the non-existence of child nodes where the entities are not referenced within the document; there Michael Glavassevich helpfully pointed me towards an old post from Andy Clark explaining as follows: