如何读取大型日志/txt文件(在几个GB中),首先在内存中占用n个线数,然后采用下一个n个行数

发布于 2025-01-25 06:53:20 字数 900 浏览 5 评论 0原文

我尝试了这个程序,该程序正在通过块中的字符读取我的文件,这是我想要的行为

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        print(piece)

,但是当我尝试使用readlines()应用相同的方法时,它对我不起作用。这是我正在尝试的代码。.

def read_in_chunks(file_object, chunk_size=5):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines()[0:chunk_size]
        if not data:
            break
        yield data


with open('Traefik.log') as f:
    for piece in read_in_chunks(f):
        print(piece)

有人可以帮助我如何实现n个行数的相同块行为?

I have tried this program which is reading my file by characters in chunks which is the behaviour I want

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        print(piece)

But When I try to apply the same method using readlines() then it doesn't works for me. Here is the code I am trying..

def read_in_chunks(file_object, chunk_size=5):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines()[0:chunk_size]
        if not data:
            break
        yield data


with open('Traefik.log') as f:
    for piece in read_in_chunks(f):
        print(piece)

Can Somebody help me how can I achieve the same chunks behaviour for N number of lines?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

哆兒滾 2025-02-01 06:53:20

default 。 /a>将流的全部内容读取到列表中。但是您可以给它一个字节大小以在块中产生线条:

阅读并返回流中的行列表。 提示可以指定以控制读取的行数:如果到目前为止的所有行的总大小(以字节/字符为单位)不超过提示。


因此,您可以将功能调整为类似的功能:

def read_in_chunks(file_object, chunk_size_hint=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines(chunk_size_hint)
        if not data:
            break
        yield data

但这不能保证每块固定的线数。如果您在文档中进一步查看,您会找到以下建议:

请注意,使用for file中的line:... non note file.readlines()。

这暗示着这样的事情

def read_in_chunks(file_object, chunk_size=10):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 10 lines"""
    data = []
    for n, line in enumerate(file_object, start=1):
        data.append(line)
        if not n % chunk_size:
            yield data
            data = []
    if data:
        yield data

可能更适合。

By default .readlines() reads the whole content of the stream into a list. But you can give it a byte size to produce lines in chunks:

Read and return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

So, you could adjust your function to something like:

def read_in_chunks(file_object, chunk_size_hint=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines(chunk_size_hint)
        if not data:
            break
        yield data

But that doesn't guarantee a fixed number of lines per chunk. If you look a bit further in the docs you'll find the following advice:

Note that it’s already possible to iterate on file objects using for line in file: ... without calling file.readlines().

That's a hint that something like this

def read_in_chunks(file_object, chunk_size=10):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 10 lines"""
    data = []
    for n, line in enumerate(file_object, start=1):
        data.append(line)
        if not n % chunk_size:
            yield data
            data = []
    if data:
        yield data

might be better suited.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文