XMLStreamReader 和真正的流
更新 Java社区中没有现成的XML解析器可以进行NIO和XML解析。这是我找到的最接近的,而且不完整: http://wiki.fasterxml.com/AaltoHome
我有以下代码:
InputStream input = ...;
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLStreamReader streamReader = xmlInputFactory.createXMLStreamReader(input, "UTF-8");
问题是,为什么方法 #createXMLStreamReader() 期望输入流中包含整个 XML 文档?如果它似乎无法处理 XML 数据的一部分,为什么它被称为“流读取器”?例如,如果我
<root>
<child>
向它输入:,它会告诉我缺少结束标签。甚至在我开始迭代流读取器本身之前。我怀疑我只是不知道如何正确使用 XMLStreamReader。我应该可以分段提供数据吧?我需要它是因为我正在处理来自网络套接字的 XML 流,并且不想将整个源文本加载到内存中。
谢谢你的帮助, 尤里。
Update There is no ready XML parser in Java community which can do NIO and XML parsing. This is the closest I found, and it's incomplete: http://wiki.fasterxml.com/AaltoHome
I have the following code:
InputStream input = ...;
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLStreamReader streamReader = xmlInputFactory.createXMLStreamReader(input, "UTF-8");
Question is, why does the method #createXMLStreamReader() expects to have an entire XML document in the input stream? Why is it called a "stream reader", if it can't seem to process a portion of XML data? For example, if I feed:
<root>
<child>
to it, it would tell me I'm missing the closing tags. Even before I begin iterating the stream reader itself. I suspect that I just don't know how to use a XMLStreamReader properly. I should be able to supply it with data by pieces, right? I need it because I'm processing a XML stream coming in from network socket, and don't want to load the whole source text into memory.
Thank you for help,
Yuri.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您可以获得您想要的 - 部分解析,但是当您到达当前可用数据的末尾时,您不能关闭流。保持流打开,解析器在到达流末尾时将简单地阻塞。当您有更多数据时,将其添加到流中,解析器将继续。
这种安排需要两个线程 - 一个线程运行解析器,另一个线程获取数据。为了桥接两个线程,您可以使用管道 - PipeInputStream 和 PipeOutputStream 对,将数据从读取器线程推送到解析器使用的输入流中。 (解析器正在从 PipeInputStream 读取数据。)
You can get what you want - a partial parse, but you must not close the stream when you reach the end of the current available data. Keep the stream open, and the parser will simply block when it gets to the end of the stream. When you have more data, then add it to the stream, and the parser will continue.
This arrangement requires two threads - one thread running the parser, and another fetching data. To bridge the two threads, you use a pipe - a PipeInputStream and PipeOutputStream pair that push data from the reader thread into the input stream used by the parser. (The parser is reading data from the PipeInputStream.)
如果您确实需要带有内容“推送”功能的 NIO,那么有开发人员有兴趣为 Aalto 完成 API。解析器本身是完整的 Stax 实现以及替代的“推送输入”(提供输入而不是使用 InputStream)。因此,如果您感兴趣,您可能想查看邮件列表。并不是每个人都会阅读 StackOverflow 的问题。 :-)
If you absolutely need NIO with content "push", there are developers interested in completing API for Aalto. Parser itself is complete Stax implementation as well as alternative "push input" (feeding input instead of using InputStream). So you might instead want to check out mailing lists if you are interested. Not everyone reads StackOverflow questions. :-)
流必须包含整个 XML 文档的内容,只是不是同时全部位于内存中(这就是流的作用)。您也许可以保持流和阅读器打开以继续输入内容;但是,它必须是格式良好的 XML 文档的一部分。
建议:在继续深入之前,您可能需要阅读更多有关套接字和流如何工作的内容。
希望这有帮助。
The stream must contain the content for an entire XML document, just not all in memory at the same time (this is what streams do). You might be able to keep the stream and the reader open to continue feeding in content; however, it would have to be part of a well-formed XML document.
Suggestion: You might want to read a bit more about how sockets and streams work before going much farther.
Hope this helps.
您使用的是哪个 Java 版本?使用 JDK 1.6.0_19,我得到了您似乎期望的行为。迭代示例 XML 片段会产生三个事件:
第四次调用 next() 会抛出 XMLStreamException: ParseError at [row,col]:[2,12]
消息:XML 文档结构必须在同一实体内开始和结束。
Which Java version are you using? With JDK 1.6.0_19, I get the behaviour you seem to be expecting. Iterating over your example XML fragment gives me three events:
The fourth invokation of next() throws an XMLStreamException: ParseError at [row,col]:[2,12]
Message: XML document structures must start and end within the same entity.
使用使用 stax 解析器的 XMLEventReader 它对我来说没有任何问题。
文件显然是你的输入。
With the XMLEventReader using stax parser it works for me without any issues.
file is obviously your input.
查看此链接以了解有关流解析器如何工作以及它如何保持更小的内存占用的更多信息。对于传入的 XML,您需要首先序列化传入的 XML 并创建格式良好的 XML,然后将其提供给流解析器。
http://www.devx.com/xml/Article/34037/1954
Look at this link to understand more about how streaming parsers work and how does it keep you r memory foot print smaller. For incoming XML, you would need to first serialize the incoming XML and create a well formed XML, then giving it to streaming parser.
http://www.devx.com/xml/Article/34037/1954