We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
使用 SAX 解析器。 SAX 解析器旨在处理巨大的 XML 文件。它不是一次性将 XML 文件加载到内存中,而是逐个元素地遍历文档并通知您。
此外,如果 XML 文件确实很大,您可能还想查看文件是如何加载的。不要打开文件并将整个内容一次性输入 SAX 解析器。相反,逐块读取它(例如一次 4Kb 块)并将其输入 SAX 解析器。
编辑:SAX 解析器的工作方式与 DOM 解析器非常不同。基本上,它一次只浏览文档的一个元素。每当它找到打开或关闭标记时,它都会调用您的函数之一(作为回调)并告诉它标记是什么以及数据是什么(如果有)。它从头开始,一直到结束,永远不会回头。这是连续剧。这意味着两件事:
更多代码。您的回调需要确定遇到某些标签时要做什么、应该跳过哪些标签等等。 SAX 解析器不会回溯,因此如果您以后需要记住任何内容,则需要您自己完成。所以,是的,处理包含许多不同标签的许多 API 将需要更多工作。
它可以解析部分XML。它并不关心您是否只提供 XML 文件的前 4 Kb。它不会生成错误,而只是在完成后请求另一个块数据。只有当它遇到不匹配的结束标记(或者您太快停止向其提供数据)时,它才会生成错误。
所以是的,这是更多的工作。但回报是更快的速度,并且解析无法放入内存的大文件没有问题。
Use the SAX parser. SAX parsers are designed to handle huge XML files. Instead of loading the XML file into memory in one go, it walks over the document element-by-element and notifies you.
Additionally, if the XML file is really big, you may also want to look at how the file is loaded. Don't open the file and feed the entire contents into the SAX parser in one go. Instead, read it chunk-by-chunk (e.g. 4Kb blocks at a time) and feed that into the SAX parser.
Edit: A SAX parser works very differently from a DOM parser. Basically, it just goes through the document one element at a time. Whenever it finds an open or close tag, it calls one of your functions (as a callback) and tells it what the tag is and what the data is (if any). It starts at the beginning and goes through to the end and never goes back. It's serial. This means two things:
More code. Your callback needs to determine what to do when certain tags are encountered, what tags should be skipped, etcetera. A SAX parser doesn't go back, so if you need to remember anything for later, you need to do that all yourself. So yeah, it will be more work to deal with many APIs containing many different tags.
It can parse partial XML. It doesn't care that you feed if just the first 4 Kb of an XML file. It will not generate an error but simply ask for another chunk data when it's done. Only when it encounters a mismatched closing tag (or you stop feeding it data too soon) will it generate an error.
So yeah, it's more work. But the payoff is much greater speed and no problem parsing huge files that would not fit into memory.
除了按照 Sander 建议使用 SAX 之外,您还可以尝试使用
XmlPullParser
您可以找到更多信息 此处Besides using SAX as suggested by Sander, your can try using
XmlPullParser
you can find more info herevtd-xml 是适合此用例的最佳 XML 解析器。它内存效率高且速度超快...这是一篇论文为了证明这一点:
http://sdiwc.us/digitlib/journal_paper.php?paper =00000582.pdf
vtd-xml is the best XML parser for this use case. It is both memory efficient and super fast... Here is a paper to prove this:
http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf