使用 .NET 读取压缩的 xml
案例:有一个很大的压缩 xml 文件需要由 .NET 程序解析。主要问题是文件太大,因此无法完全加载到内存中并解压缩。
该文件需要逐部分读取,以便解压缩这些部分后它们“一致”。如果某个部分仅包含节点的一半,则无法在任何 xml 结构中进行解析。
每一个帮助将不胜感激。 :)
编辑:当前的解决方案逐部分提取整个 zip 文件并将其作为 xml 文件写入磁盘上。然后读取并解析xml。到目前为止,我的网站没有更好的想法:)。
The case: there is a large zipped xml file which need to be parsed by a .NET program. The main issue is the too big size of the file so it can not be loaded fully in the memory and unzipped.
The file need to be read part by part in a way that after unzipping this parts they are "consistent". If a part includes only half of a node it will not be possible to be parsed in any xml structure.
Every help will be appreciated. :)
Edit: The current solution extracts the whole zip file part by part and writes it as a xml file on the disk. Then reads and parses the xml. No better ideas so far from my site :).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用 DotNetZip 您可以执行以下操作:
Using DotNetZip you can do this:
您可以尝试
SharpZipLib
,然后使用XmlReader
开始解析它。You could give
SharpZipLib
a try and then to useXmlReader
to start parsing it.您没有尝试过 DotNetZip 库(点击此链接) 吗?
回复您最近的版本::
你所做的就是标准
流程/方式..
据我所知
对此没有其他选择。
Haven't you tried DotNetZip Library (click on this link) ?
In reply to your recent edition::
What you are doing is the standard
flow / way ..
As per my knowledge
there are no alternatives for this.
关于您的编辑:除非您实际上想要将该 xml 文件放在磁盘上(在某些情况下当然可能是这种情况),否则我会将其提取到 MemoryStream 代替。
Regarding your edit: Unless you actually want to have that xml file on disk(which could of course be the case in some scenarios), I would extract it to a MemoryStream instead.
嗯,这里有两个问题,解压缩文件的方式可以为您提供数据块,以及一种能够基于一次只能读取数据块来读取 XML 的方法。这与我们大多数人处理 XML 的方式不同,我们只是将其一次性读入内存,但您说这不是一种选择。
这意味着您将必须使用专为这种情况构建的 Streams。此解决方案可行,但可能会受到限制,具体取决于您希望对 XML 数据执行的操作。你说它需要被解析,但你能够做到这一点的唯一方法(因为你不能将它保存在内存中)是能够以“消防水龙带方式”在解析时逐步遍历每个节点来读取它。希望这足以提取您需要的数据或按照您需要的方式处理它(将其插入数据库,仅提取您感兴趣的部分并将它们保存到内存中较小的 XML 文档中?等等)
因此,第一项工作是从 zip 文件中获取流,使用 SharpZipLib 可以轻松完成(+1 给 Rubens)。在项目中添加对 SharpZipLib dll 的引用。下面是一些从 zip 创建流,然后将其添加到内存流的代码(您可能不想这样做,但它向您展示了我如何使用它来取回数据的 byte[] ,您只需要该流):
那么如果您按照这篇文章操作:http://support.microsoft.com/kb/301228 来自 MS,您应该能够合并两批代码并开始从 zip 流中读取 XML :)
Hmmm you have two problems here, unzipping the file in a manner that can give you chunks of data and a method to be able to read the XML based on being able to just read chunks at a time. This different to how most of us are used to dealing with XML where we just read it in one time into memory, but you say thats not an option.
This means you are going to have to use Streams which are build for just this case. This solution will work but it might be limited depending on what you are hoping to do with the XML data. You say it needs to be parsed but the only way you will be able to do that (as you can't keep it in memory) is to be able to read it in a "fire hose manner" stepping through each node as its parsed. Hopefull thats enough to be able to pull out what data you need or to process it however you need too (poke it into a DB, extract only the sections you are intested in and save them into a smaller in memory XML doc? etc.)
So first job, get a stream from your zip file, quite easy to do with SharpZipLib (+1 to Rubens). Add a reference to the SharpZipLib dll in your project. Heres some code that creates a stream from a zip and then adds it to a memory stream (you might not want to do that bit but it shows you how I use it to get back a byte[] of data, you just want the stream):
Then if you follow this article: http://support.microsoft.com/kb/301228 from MS you should be able to merge the two lots of code and start reading your XML from a zip stream :)