读取大型开放式 xml 电子表格
我需要使用 openxml 库读取(并解析)大型电子表格文件(20-50MB),并且似乎没有一种方法可以一次一个地流式传输行进行解析。
我一直遇到内存不足异常,因为似乎只要我尝试访问一行(或迭代),就会加载整个行内容(100K+ 行)。
每次调用,是否 Elements.Where( 和 query ) 或 Descendants() 似乎加载整个行集
是否有办法进行流式传输或一次只读取一行?
谢谢
i need to read (and parse) large spreadsheet files (20-50MB) using the openxml libraries and there doesn't seem to be a way to stream the rows one at a time for parsing.
i'm consistently getting Out Of Memory exceptions as it seems as soon as i attempt to access a row (or iterate) the entire row contents are loaded (100K+ rows).
each of the calls, whether Elements.Where( with query )
or Descendants ( ) seem to load the entire rowset
is there a way to stream or just read a row at a time ?
thx
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我找到了答案。如果您在工作表部分使用 OpenXmlReader,您可以迭代并有效地延迟加载您遇到的元素。
查找
并加载行(惰性)
i found an answer. if you use the OpenXmlReader on the worksheet part you can iterate through and effectively lazy load the elements you come across.
look for
and load the row (lazy)
openxml 库使用 dom 还是 sax 模型?使用 dom,您通常必须立即将整个文档保存在内存中,但使用 sax,您可以在事件发生时对其进行流式传输。
do the openxml libraries use dom or sax models? with dom you usually have to hold the entire document in memory at once, but with sax you can stream the events as they come.
以下是使用 SAX 方法读取具有多个工作表的大型 excel 文件的代码:
代码适用于:
1. 从第一张开始按升序阅读
2. 如果 excel 文件正在被另一个进程使用,OpenXML 仍然会读取该文件。
3.此代码读取空白单元格
4. 读取完成后跳过空行。
5. 4秒内读取5000行。
Here is the code to read large excel file with multiple sheets using SAX approach:
Code works for:
1. read the sheets from first in ascending order
2. if excel file is being used by another process, OpenXML still reads that.
3. This code reads blank cells
4. skip empty rows after reading complete.
5. it reads 5000 rows within 4 seconds.