是否有用于基于行的文件读取的 python 库?
可能的重复:
Python:如何将巨大的文本文件读入内存
要逐行处理大型文本文件(1G+),需要按任意行号进行随机访问,最重要的是,不要将整个文件内容加载到 RAM 中。有 python 库可以做到这一点吗?
在分析大型日志文件时很有用,只读就足够了。
如果没有这样的标准库,我必须寻求替代方法:找到一组函数/类,可以从一个大的类似字符串的对象返回第N行子字符串,这样我就可以 mmap(yes ,我的意思是内存映射文件对象)将文件映射到该对象,然后进行基于行的处理。
谢谢。
PS:日志文件几乎肯定具有可变的行长度。
Possible Duplicate:
Python: How to read huge text file into memory
To process a large text file(1G+) line by line , random access by any line number is desired, most importantly, without loading the whole file content into RAM. Is there a python library to do that?
It is beneficial when analyzing a large log file, read only is enough.
If there is no such standard library, I have to seek an alternative method: Find a set of function/class that can return the N-th line of sub-string from a big string-like object, so that I can mmap(yes, I mean memory-mapped file object) the file to that object then do line-based processing.
Thank you.
PS: A log file is almost sure to have variable line length.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为像下面这样的东西可能会起作用,因为 文件object 的方法
readline()
一次读取一行。如果行的长度是任意的,则需要按如下所示对位置进行索引。如果行的长度相同,您可以执行
f.seek(linenumber*linelenght)
I think that something like below might work, since the file object's method
readline()
reads one line at a time. If the lines are of arbitrary length, you need to index the positions like follows.If the lines were of same length, you could just do
f.seek(linenumber*linelenght)