使用缓冲读取器读取大型 .csv 文件,Python
我正在尝试在 python 脚本中打开大型 .csv 文件(16k 行+,~15 列),但遇到了一些问题。
我使用内置的 open() 函数打开文件,然后使用输入文件声明一个 csv.DictReader 。循环的结构如下:
for (i, row) in enumerate(reader):
# do stuff (send serial packet, read response)
但是,如果我使用的文件长度超过大约 20 行,该文件将打开,但在几次迭代内我会得到一个 ValueError: I/O 操作在一个关闭的文件上。
我的想法是,我可能会耗尽内存(尽管 16k 行文件只有 8MB,而且我有 3GB 内存),在这种情况下,我希望我需要使用某种缓冲区来仅加载一次将文件存入内存。
我走在正确的轨道上吗?或者是否还有其他原因导致文件意外关闭?
编辑:大约有一半的时间我使用 11 行的 csv 运行此命令,它会给出 ValueError。错误并不总是发生在同一行
I'm trying to open large .csv files (16k lines+, ~15 columns) in a python script, and am having some issues.
I use the built in open() function to open the file, then declare a csv.DictReader using the input file. The loop is structured like this:
for (i, row) in enumerate(reader):
# do stuff (send serial packet, read response)
However, if I use a file longer than about 20 lines, the file will open, but within a few iterations I get a ValueError: I/O operation on a closed file.
My thought is that I might be running out of memory (though the 16k line file is only 8MB, and I have 3GB of ram), in which case I expect I'll need to use some sort of buffer to load only sections of the file into memory at a time.
Am I on the right track? Or could there be other causes for the file closing unexpectedly?
edit: for about half the times I run this with a csv of 11 lines, it gives me the ValueError. The error does not always happen at the same line
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
16k 行对于 3GB RAM 来说没什么,很可能你的问题是其他问题,例如你在其他进程中花费了太多时间,这会干扰打开的文件。只是为了确定,无论如何,当你有 3GB 内存时,为了速度,将整个文件加载到内存中,然后解析,例如
,至少你不应该得到文件打开错误。
16k lines is nothing for 3GB Ram, most probably your problem is something else e.g. you are taking too much time in some other process which interferes with opened file. Just to be sure and anyway for speed when you have 3GB ram , load whole file in memory and then parse e.g.
In this at-least you shouldn't get file open error.
csv_reader 更快。将整个文件作为块读取。为了避免内存泄漏,最好使用子进程。
from multiprocessing import Process
有关更多信息,请访问此链接。 http://articlesdictionary.wordpress.com/2013/ 09/29/read-csv-file-in-python/
csv_reader is faster. Read the whole file as blocks. To avoid the memory leak better to use sub process.
from multiprocessing import Process
For more information please go through this link. http://articlesdictionary.wordpress.com/2013/09/29/read-csv-file-in-python/