手动创建所有Document节点的NodeList
我目前手动生成所有文档节点(按文档顺序)的 NodeList
。获取此 NodeList 的 XPath 表达式是
//. | //@* | //namespace::*
我手动遍历 DOM 并收集节点的第一次尝试(NodeSet
是一个原始的 NodeList
实现,委托给a List
):
private static void walkRecursive(Node cur, NodeSet nodes) {
nodes.add(cur);
if (cur.hasAttributes()) {
NamedNodeMap attrs = cur.getAttributes();
for (int i=0; i < attrs.getLength(); i++) {
Node child = attrs.item(i);
walkRecursive(child, nodes);
}
}
int type = cur.getNodeType();
if (type == Node.ELEMENT_NODE || type == Node.DOCUMENT_NODE) {
NodeList children = cur.getChildNodes();
if (children == null)
return;
for (int i=0; i < children.getLength(); i++) {
Node child = children.item(i);
walkRecursive(child, list);
}
}
}
我将通过调用 walkRecursive(doc,nodes)
开始递归,其中 doc
是 org.w3c.Document
和nodes
一个(但为空)NodeSet
。
我使用这个原始 XML 文档对此进行了测试:
<?xml version="1.0"?>
<myns:root xmlns:myns="http://www.my.ns/#">
<myns:element/>
</myns:root>
例如,如果我规范化我手动创建的 NodeSet 和最初提到的 XPath 表达式生成的 NodeList 并逐个字节地比较两个字节,那么结果是相等的并且似乎工作得很好。
但是,如果我迭代两个 NodeList
并打印调试信息(typeString
只是生成一个字符串表示形式)
for (int i=0; i < nodes.getLength(); i++) {
Node child = nodes.item(i);
System.out.println("Type: " + typeString(child.getNodeType()) +
" Name:" + child.getNodeName() +
" Local name: " + child.getLocalName() +
" NS: " + child.getNamespaceURI());
}
,那么我会收到以下输出: XPath 生成的 NodeList
:
Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null
这适用于手动生成的 NodeList
:
Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null
因此,如您所见,在第一个示例中,NodeList 还包含 Node
> 对于 XML 命名空间:
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
现在我的问题:
a) 如果我解释xml-names11 正确,那么我不需要 xmlns:xml 声明:
前缀 xml 根据定义绑定到命名空间名称 http://www.w3.org /XML/1998/命名空间。它可以但不必被声明,并且不得未声明或绑定到任何其他命名空间名称。其他前缀不得绑定到此命名空间名称,并且不得将其声明为默认命名空间。
我说得对吗? (至少 c)朝这个方向暗示)
b)但是,为什么 XPath 评估无论如何都要添加它 - 它不应该只包含最初存在的内容而不是自动添加东西吗?
c) 这可能会导致 XML 规范化 出现问题,尽管它 shouldn't - 在规范化过程中应省略 xml
命名空间的声明。有谁知道(Java)实现会犯这个错误吗?
编辑:
这是我用来计算包含“xml”命名空间节点的 XPath 表达式的代码:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
InputStream in = ...;
try {
Document doc = dbf.newDocumentBuilder().parse(in);
XPathFactory fac = XPathFactory.newInstance();
XPath xp = fac.newXPath();
XPathExpression exp = xp.compile("//. | //@* | //namespace::*");
NodeList nodes = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
} finally {
in.close();
}
I currently generate a NodeList
of all the Document nodes (in document order) manually. The XPath expression to get this NodeList
is
//. | //@* | //namespace::*
My first attempt for walking the DOM manually and collecting the nodes (NodeSet
is a primitive NodeList
implementation delegating to a List
):
private static void walkRecursive(Node cur, NodeSet nodes) {
nodes.add(cur);
if (cur.hasAttributes()) {
NamedNodeMap attrs = cur.getAttributes();
for (int i=0; i < attrs.getLength(); i++) {
Node child = attrs.item(i);
walkRecursive(child, nodes);
}
}
int type = cur.getNodeType();
if (type == Node.ELEMENT_NODE || type == Node.DOCUMENT_NODE) {
NodeList children = cur.getChildNodes();
if (children == null)
return;
for (int i=0; i < children.getLength(); i++) {
Node child = children.item(i);
walkRecursive(child, list);
}
}
}
I would start the recursion with calling walkRecursive(doc, nodes)
where doc
is the org.w3c.Document
and nodes
a (yet empty) NodeSet
.
I tested this using this primitive XML document:
<?xml version="1.0"?>
<myns:root xmlns:myns="http://www.my.ns/#">
<myns:element/>
</myns:root>
If I for example canonicalize my manually created NodeSet
and the NodeList
generated by the initially mentioned XPath expression and compare the two byte for byte, then the result is equal and seems to work just fine.
But, if I iterate over the two NodeList
s and print debug info (typeString
simply generates a string representation)
for (int i=0; i < nodes.getLength(); i++) {
Node child = nodes.item(i);
System.out.println("Type: " + typeString(child.getNodeType()) +
" Name:" + child.getNodeName() +
" Local name: " + child.getLocalName() +
" NS: " + child.getNamespaceURI());
}
then I receive this output for the XPath-generated NodeList
:
Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null
and this for the manually generated NodeList
:
Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null
So, as you can see, in the first example the NodeList additionally contains the Node
for the XML namespace:
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Now my questions:
a) If I interpret xml-names11 correctly, then I don't need the xmlns:xml declaration:
The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be undeclared or bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.
Am I correct? (at least c) hints in that direction)
b) But then, why does the XPath evaluation add it anyway - shouldn't it just include what was there in the first place instead of automagically adding things?
c) This can cause trouble with XML canonicalization, although it shouldn't - declarations of the xml
namespace should be omitted during canonicalization. Does anyone know of (Java) implementations that get this wrong?
Edit:
Here's the code I used to evaluate the XPath expression that contained the 'xml' namespace node:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
InputStream in = ...;
try {
Document doc = dbf.newDocumentBuilder().parse(in);
XPathFactory fac = XPathFactory.newInstance();
XPath xp = fac.newXPath();
XPathExpression exp = xp.compile("//. | //@* | //namespace::*");
NodeList nodes = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
} finally {
in.close();
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于您可以
在不声明“xml”前缀的情况下进行编写,因此它必须隐式存在。因此,在
//namespace:*
位置步骤中包含此命名空间声明的命名空间节点是正确的。所以,
a) 你错了,你需要它(好吧,取决于你的代码的目的)
b) 参见上面
c) 否,但我见过其他名称空间极端情况,其中事情变得混乱(例如 org.dom4j 转换问题.文件至org.w3c.dom.Document 和 XML 签名
Since you can write
without declaring the "xml" prefix, then it must be there implicitly. It is therefore correct to include the namespace node for this namespace declaration in the
//namespace:*
location stepSo,
a) you are wrong, you need it (well, depending on the purpose of your code)
b) see above
c) no, but I've seen other namespace corner cases where things went haywire (e.g. Problem with conversion of org.dom4j.Document to org.w3c.dom.Document and XML Signature