将 CDATA 解析为字符时,eventReader 中罕见的 Java 6 StAX 解析器错误
我有使用 eventReader 从 StAX 解析器获取字符的代码。代码如下所示:
private String getNextCharacters(XMLEventReader eventReader) throws XMLStreamException {
StringBuilder characters = new StringBuilder();
XMLEvent event = eventReader.nextEvent();
String data = event.asCharacters().getData();
characters.append(data);
while (eventReader.peek() != null && eventReader.peek().isCharacters()) {
event = eventReader.nextEvent();
data = event.asCharacters().getData();
characters.append(data);
}
return characters.toString();
}
while 循环是因为有时 asCharacters 不会在相邻的 isCharacters 事件之间合并。这似乎与是否设置 is_coalescing 标志无关。这似乎是一个合理的解决方法,但它似乎引发了第二个错误。偶尔我会看到]]>附加到我的字符串中。这种情况非常罕见,大约每 5000 行 XML 中才会出现一次,但这种情况经常发生。调试我发现当第一个事件是 CDATA 时,它发生在第二个 isCharacters 事件中。解析器似乎在第二个事件中失去了对 CDATA 指令的跟踪。
那么,还有其他人看到过这个吗?有没有人有比简单地剥离 ]]> 更好的解决方法?离开我的绳子末端?我在网上或这里没有找到任何重要的东西。
I have code to fetch characters from a StAX parser using eventReader. The code looks like this:
private String getNextCharacters(XMLEventReader eventReader) throws XMLStreamException {
StringBuilder characters = new StringBuilder();
XMLEvent event = eventReader.nextEvent();
String data = event.asCharacters().getData();
characters.append(data);
while (eventReader.peek() != null && eventReader.peek().isCharacters()) {
event = eventReader.nextEvent();
data = event.asCharacters().getData();
characters.append(data);
}
return characters.toString();
}
The while loop is because occasionally the asCharacters is not coalesced between adjacent isCharacters events. This seems to be independent of the is_coalescing flags being set or not. This seemed like a reasonable workaround but it seems to have driven a secondary bug. Occasionally I see ]]> appended to my character string. This is very infrequent--about once in 5000 lines of XML but it happens consistently. Debugging I find that it happens in the second isCharacters event when the first event is CDATA. The parser seems to lose track of the CDATA instruction by the second event.
So, has anyone else seen this? Does anyone have a better workaround than simply stripping ]]> off the end of my string? I didn't find anything significant online or here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
而不是
你可以去
HTH,
最大限度
Instead of
you could go
HTH,
Max