当前位置：文江博客话题详情

是否有用于基于行的文件读取的 python 库？

发布于 2025-01-03 07:43:10 字数 466 浏览 1 评论 0原文

可能的重复：
Python：如何将巨大的文本文件读入内存

要逐行处理大型文本文件（1G+），需要按任意行号进行随机访问，最重要的是，不要将整个文件内容加载到 RAM 中。有 python 库可以做到这一点吗？

在分析大型日志文件时很有用，只读就足够了。

如果没有这样的标准库，我必须寻求替代方法：找到一组函数/类，可以从一个大的类似字符串的对象返回第N行子字符串，这样我就可以 mmap(yes ，我的意思是内存映射文件对象）将文件映射到该对象，然后进行基于行的处理。

谢谢。

PS：日志文件几乎肯定具有可变的行长度。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

疑心病 2025-01-10 07:43:11

我认为像下面这样的东西可能会起作用，因为文件object 的方法 readline() 一次读取一行。如果行的长度是任意的，则需要按如下所示对位置进行索引。

lines = [0]
with open("testmat.txt") as f:
    while f.readline():
        lines.append(f.tell())
    # now you can read an arbitrary line:
    f.seek(lines[1235])
    line = f.readline()

如果行的长度相同，您可以执行 f.seek(linenumber*linelenght)

I think that something like below might work, since the file object's method readline() reads one line at a time. If the lines are of arbitrary length, you need to index the positions like follows.

lines = [0]
with open("testmat.txt") as f:
    while f.readline():
        lines.append(f.tell())
    # now you can read an arbitrary line:
    f.seek(lines[1235])
    line = f.readline()

If the lines were of same length, you could just do f.seek(linenumber*linelenght)

回复收藏 0 原文

~没有更多了~