解析 XML 文件并保留有关行号的信息

发布于 2024-10-06 04:34:15 字数 1793 浏览 1 评论 0原文

我正在创建一个工具来分析一些 XML 文件（准确地说是 XHTML 文件）。该工具的目的不仅是验证 XML 结构，还可以检查某些属性的值。

因此，我创建了自己的 org.xml.sax.helpers.DefaultHandler 来处理 XML 解析期间的事件。我的要求之一是获得有关当前行号的信息。因此，我决定将 org.xml.sax.helpers.LocatorImpl 添加到我自己的 DefaultHandler 中。这几乎解决了我所有的问题，除了有关 XML 属性的问题。

举个例子：

<rootNode>
    <foo att1="val1"/>
    <bar att2="val2"
         answerToEverything="43"
         att3="val3"/>
</rootNode>

我的一条规则表明，如果在节点 bar 上定义了属性 answerToEverything，则其值不应与 42。

当遇到此类 XML 时，我的工具应该检测到错误。因为我想给用户一个精确的错误消息，例如：

文件“foo.xhtml”第 4 行出错：answerToEverything 只允许“42”作为值。

我的解析器必须能够在解析过程中保留行号，即使是属性。如果我们为我自己的 DefaultHandler 类考虑以下实现：

public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    System.out.println("Start element <" + qName + ">" + x());
    for (int i = 0; i < attributes.getLength(); i++) {
        System.out.println("Att '" + attributes.getQName(i) + "' = '" + attributes.getValue(i) + "' at " + locator.getLineNumber() + ":" + locator.getColumnNumber());
    }
}

那么对于节点 >bar> ，它将显示以下输出：

从 5:23 开始元素
Att 'att2' = 'val2' 在 5:23
Att 'answerToEverything' = '43' 在 5:23
Att 'att3' = 'val3' 在 5:23

如您所见，行号是错误的，因为解析器会将整个节点（包括其属性）视为一个块。

理想情况下，如果接口 ContentHandler 定义了 startAttribute 和 startElementBeforeReadingAttributes 方法，我在这里不会有任何问题：o）

所以我的问题我该如何解决我的问题？

作为信息，我正在使用 Java 6

ps：也许这个问题的另一个标题可能是 Java SAX 解析与属性解析事件，或者类似的东西......

原文

I am creating a tool that analyzes some XML files (XHTML files to be precise). The purpose of this tool is not only to validate the XML structure, but also to check the value of some attributes.

So I created my own org.xml.sax.helpers.DefaultHandler to handle events during the XML parsing. One of my requirements is to have the information about the current line number. So I decided to add a org.xml.sax.helpers.LocatorImpl to my own DefaultHandler. This solves almost all my problems, except one regarding the XML attributes.

Let's take an example:

<rootNode>
    <foo att1="val1"/>
    <bar att2="val2"
         answerToEverything="43"
         att3="val3"/>
</rootNode>

One of my rules indicates that if the attribute answerToEverything is defined on the node bar, its value should not be different from 42.

When encountering such XML, my tool should detect an error. As I want to give a precise error message to the user, such as:

Error in file "foo.xhtml", line #4: answerToEverything only allow "42" as value.

my parser must be able to keep the line number during the parsing, even for attributes. If we consider the following implementation for my own DefaultHandler class:

public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    System.out.println("Start element <" + qName + ">" + x());
    for (int i = 0; i < attributes.getLength(); i++) {
        System.out.println("Att '" + attributes.getQName(i) + "' = '" + attributes.getValue(i) + "' at " + locator.getLineNumber() + ":" + locator.getColumnNumber());
    }
}

then for the node >bar>, it will display the following output:

Start element at 5:23
Att 'att2' = 'val2' at 5:23
Att 'answerToEverything' = '43' at 5:23
Att 'att3' = 'val3' at 5:23

As you can see, the line number is wrong because the parser will consider the whole node, including its attributes as one block.

Ideally, if the interface ContentHandler would have defined the startAttribute and startElementBeforeReadingAttributes methods, I wouldn't have any problem here :o)

So my question is how can I solve my problem?

For information, I am using Java 6

ps: Maybe another title for this question could be Java SAX parsing with attributes parsing events, or something like that...

分享到QQ

分享到微博