将 CDATA 解析为字符时,eventReader 中罕见的 Java 6 StAX 解析器错误

发布于 2024-12-28 00:05:37 字数 930 浏览 1 评论 0原文

我有使用 eventReader 从 StAX 解析器获取字符的代码。代码如下所示:

private String getNextCharacters(XMLEventReader eventReader) throws XMLStreamException {
    StringBuilder characters = new StringBuilder();
    XMLEvent event = eventReader.nextEvent();

    String data = event.asCharacters().getData();
    characters.append(data);

    while (eventReader.peek() != null && eventReader.peek().isCharacters()) {
        event = eventReader.nextEvent();
        data = event.asCharacters().getData();
        characters.append(data);
    }

    return characters.toString();
}

while 循环是因为有时 asCharacters 不会在相邻的 isCharacters 事件之间合并。这似乎与是否设置 is_coalescing 标志无关。这似乎是一个合理的解决方法,但它似乎引发了第二个错误。偶尔我会看到]]>附加到我的字符串中。这种情况非常罕见,大约每 5000 行 XML 中才会出现一次,但这种情况经常发生。调试我发现当第一个事件是 CDATA 时,它发生在第二个 isCharacters 事件中。解析器似乎在第二个事件中失去了对 CDATA 指令的跟踪。

那么,还有其他人看到过这个吗?有没有人有比简单地剥离 ]]> 更好的解决方法?离开我的绳子末端?我在网上或这里没有找到任何重要的东西。

I have code to fetch characters from a StAX parser using eventReader. The code looks like this:

private String getNextCharacters(XMLEventReader eventReader) throws XMLStreamException {
    StringBuilder characters = new StringBuilder();
    XMLEvent event = eventReader.nextEvent();

    String data = event.asCharacters().getData();
    characters.append(data);

    while (eventReader.peek() != null && eventReader.peek().isCharacters()) {
        event = eventReader.nextEvent();
        data = event.asCharacters().getData();
        characters.append(data);
    }

    return characters.toString();
}

The while loop is because occasionally the asCharacters is not coalesced between adjacent isCharacters events. This seems to be independent of the is_coalescing flags being set or not. This seemed like a reasonable workaround but it seems to have driven a secondary bug. Occasionally I see ]]> appended to my character string. This is very infrequent--about once in 5000 lines of XML but it happens consistently. Debugging I find that it happens in the second isCharacters event when the first event is CDATA. The parser seems to lose track of the CDATA instruction by the second event.

So, has anyone else seen this? Does anyone have a better workaround than simply stripping ]]> off the end of my string? I didn't find anything significant online or here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

绝情姑娘 2025-01-04 00:05:37

而不是

data = event.asCharacters().getData();

你可以去

Characters characters = event.asCharacters();
data = characters.getData();

if(characters.isCData()) {
/* handle CDATA */
} else if (characters.isWhiteSpace()) {
/* handle whitespace*/
} else if (characters.isIgnorableWhiteSpace()) {
/* handle ignorable whitespace*/
}

HTH,
最大限度

Instead of

data = event.asCharacters().getData();

you could go

Characters characters = event.asCharacters();
data = characters.getData();

if(characters.isCData()) {
/* handle CDATA */
} else if (characters.isWhiteSpace()) {
/* handle whitespace*/
} else if (characters.isIgnorableWhiteSpace()) {
/* handle ignorable whitespace*/
}

HTH,
Max

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文