手动创建所有Document节点的NodeList

发布于 2024-11-28 20:17:50 字数 4322 浏览 5 评论 0原文

我目前手动生成所有文档节点(按文档顺序)的 NodeList 。获取此 NodeList 的 XPath 表达式是

//. | //@* | //namespace::*

我手动遍历 DOM 并收集节点的第一次尝试(NodeSet 是一个原始的 NodeList 实现,委托给a List):

private static void walkRecursive(Node cur, NodeSet nodes) {
    nodes.add(cur);

    if (cur.hasAttributes()) {
        NamedNodeMap attrs = cur.getAttributes();
        for (int i=0; i < attrs.getLength(); i++) {
            Node child = attrs.item(i);
            walkRecursive(child, nodes);
        }
    }

    int type = cur.getNodeType();
    if (type == Node.ELEMENT_NODE || type == Node.DOCUMENT_NODE) {
        NodeList children = cur.getChildNodes();
        if (children == null)
            return;

        for (int i=0; i < children.getLength(); i++) {
            Node child = children.item(i);
            walkRecursive(child, list);
        }
    }
}

我将通过调用 walkRecursive(doc,nodes) 开始递归,其中 docorg.w3c.Documentnodes 一个(但为空)NodeSet

我使用这个原始 XML 文档对此进行了测试:

<?xml version="1.0"?>
<myns:root xmlns:myns="http://www.my.ns/#">
  <myns:element/>
</myns:root>

例如,如果我规范化我手动创建的 NodeSet 和最初提到的 XPath 表达式生成的 NodeList 并逐个字节地比较两个字节,那么结果是相等的并且似乎工作得很好。

但是,如果我迭代两个 NodeList 并打印调试信息(typeString 只是生成一个字符串表示形式)

for (int i=0; i < nodes.getLength(); i++) {
    Node child = nodes.item(i);
    System.out.println("Type: " + typeString(child.getNodeType()) +
                       " Name:" + child.getNodeName() + 
                       " Local name: " + child.getLocalName() +
                       " NS: " + child.getNamespaceURI());
}

,那么我会收到以下输出: XPath 生成的 NodeList

Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null

这适用于手动生成的 NodeList

Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null

因此,如您所见,在第一个示例中,NodeList 还包含 Node > 对于 XML 命名空间:

Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/

现在我的问题:

a) 如果我解释xml-names11 正确,那么我不需要 xmlns:xml 声明:

前缀 xml 根据定义绑定到命名空间名称 http://www.w3.org /XML/1998/命名空间。它可以但不必被声明,并且不得未声明或绑定到任何其他命名空间名称。其他前缀不得绑定到此命名空间名称,并且不得将其声明为默认命名空间。

我说得对吗? (至少 c)朝这个方向暗示)

b)但是,为什么 XPath 评估无论如何都要添加它 - 它不应该只包含最初存在的内容而不是自动添加东西吗?

c) 这可能会导致 XML 规范化 出现问题,尽管它 shouldn't - 在规范化过程中应省略 xml 命名空间的声明。有谁知道(Java)实现会犯这个错误吗?


编辑:

这是我用来计算包含“xml”命名空间节点的 XPath 表达式的代码:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
InputStream in = ...;
try {
    Document doc = dbf.newDocumentBuilder().parse(in);
    XPathFactory fac = XPathFactory.newInstance();
    XPath xp = fac.newXPath();
    XPathExpression exp = xp.compile("//. | //@* | //namespace::*");
    NodeList nodes = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
} finally {
    in.close();
}

I currently generate a NodeList of all the Document nodes (in document order) manually. The XPath expression to get this NodeList is

//. | //@* | //namespace::*

My first attempt for walking the DOM manually and collecting the nodes (NodeSet is a primitive NodeList implementation delegating to a List):

private static void walkRecursive(Node cur, NodeSet nodes) {
    nodes.add(cur);

    if (cur.hasAttributes()) {
        NamedNodeMap attrs = cur.getAttributes();
        for (int i=0; i < attrs.getLength(); i++) {
            Node child = attrs.item(i);
            walkRecursive(child, nodes);
        }
    }

    int type = cur.getNodeType();
    if (type == Node.ELEMENT_NODE || type == Node.DOCUMENT_NODE) {
        NodeList children = cur.getChildNodes();
        if (children == null)
            return;

        for (int i=0; i < children.getLength(); i++) {
            Node child = children.item(i);
            walkRecursive(child, list);
        }
    }
}

I would start the recursion with calling walkRecursive(doc, nodes) where doc is the org.w3c.Document and nodes a (yet empty) NodeSet.

I tested this using this primitive XML document:

<?xml version="1.0"?>
<myns:root xmlns:myns="http://www.my.ns/#">
  <myns:element/>
</myns:root>

If I for example canonicalize my manually created NodeSet and the NodeList generated by the initially mentioned XPath expression and compare the two byte for byte, then the result is equal and seems to work just fine.

But, if I iterate over the two NodeLists and print debug info (typeString simply generates a string representation)

for (int i=0; i < nodes.getLength(); i++) {
    Node child = nodes.item(i);
    System.out.println("Type: " + typeString(child.getNodeType()) +
                       " Name:" + child.getNodeName() + 
                       " Local name: " + child.getLocalName() +
                       " NS: " + child.getNamespaceURI());
}

then I receive this output for the XPath-generated NodeList:

Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null

and this for the manually generated NodeList:

Type: DocumentNode Name:#document Local name: null NS: null
Type: Element Name:myns:root Local name: root NS: http://www.my.ns/#
Type: Attribute Name:xmlns:myns Local name: myns NS: http://www.w3.org/2000/xmlns/
Type: Text Name:#text Local name: null NS: null
Type: Element Name:myns:element Local name: element NS: http://www.my.ns/#
Type: Text Name:#text Local name: null NS: null

So, as you can see, in the first example the NodeList additionally contains the Node for the XML namespace:

Type: Attribute Name:xmlns:xml Local name: xml NS: http://www.w3.org/2000/xmlns/

Now my questions:

a) If I interpret xml-names11 correctly, then I don't need the xmlns:xml declaration:

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be undeclared or bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

Am I correct? (at least c) hints in that direction)

b) But then, why does the XPath evaluation add it anyway - shouldn't it just include what was there in the first place instead of automagically adding things?

c) This can cause trouble with XML canonicalization, although it shouldn't - declarations of the xml namespace should be omitted during canonicalization. Does anyone know of (Java) implementations that get this wrong?


Edit:

Here's the code I used to evaluate the XPath expression that contained the 'xml' namespace node:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(false);
InputStream in = ...;
try {
    Document doc = dbf.newDocumentBuilder().parse(in);
    XPathFactory fac = XPathFactory.newInstance();
    XPath xp = fac.newXPath();
    XPathExpression exp = xp.compile("//. | //@* | //namespace::*");
    NodeList nodes = (NodeList)exp.evaluate(doc, XPathConstants.NODESET);
} finally {
    in.close();
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

救星 2024-12-05 20:17:50

由于您可以

<myns:root xml:space="preserve" xmlns:myns="http://www.my.ns/#">
  <myns:element/>
</myns:root>

在不声明“xml”前缀的情况下进行编写,因此它必须隐式存在。因此,在 //namespace:* 位置步骤中包含此命名空间声明的命名空间节点是正确的。

所以,

a) 你错了,你需要它(好吧,取决于你的代码的目的)

b) 参见上面

c) 否,但我见过其他名称空间极端情况,其中事情变得混乱(例如 org.dom4j 转换问题.文件至org.w3c.dom.Document 和 XML 签名

Since you can write

<myns:root xml:space="preserve" xmlns:myns="http://www.my.ns/#">
  <myns:element/>
</myns:root>

without declaring the "xml" prefix, then it must be there implicitly. It is therefore correct to include the namespace node for this namespace declaration in the //namespace:* location step

So,

a) you are wrong, you need it (well, depending on the purpose of your code)

b) see above

c) no, but I've seen other namespace corner cases where things went haywire (e.g. Problem with conversion of org.dom4j.Document to org.w3c.dom.Document and XML Signature

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文