如何读取大型日志/txt文件（在几个GB中），首先在内存中占用n个线数，然后采用下一个n个行数

发布于 2025-01-25 06:53:20 字数 900 浏览 5 评论 0原文

我尝试了这个程序，该程序正在通过块中的字符读取我的文件，这是我想要的行为

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        print(piece)

，但是当我尝试使用readlines（）应用相同的方法时，它对我不起作用。这是我正在尝试的代码。.

def read_in_chunks(file_object, chunk_size=5):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines()[0:chunk_size]
        if not data:
            break
        yield data


with open('Traefik.log') as f:
    for piece in read_in_chunks(f):
        print(piece)

有人可以帮助我如何实现n个行数的相同块行为？

原文

I have tried this program which is reading my file by characters in chunks which is the behaviour I want

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        print(piece)

But When I try to apply the same method using readlines() then it doesn't works for me. Here is the code I am trying..

def read_in_chunks(file_object, chunk_size=5):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines()[0:chunk_size]
        if not data:
            break
        yield data


with open('Traefik.log') as f:
    for piece in read_in_chunks(f):
        print(piece)

Can Somebody help me how can I achieve the same chunks behaviour for N number of lines?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哆兒滾 2025-02-01 06:53:20

default 。 /a>将流的全部内容读取到列表中。但是您可以给它一个字节大小以在块中产生线条：

阅读并返回流中的行列表。提示可以指定以控制读取的行数：如果到目前为止的所有行的总大小（以字节/字符为单位）不超过提示。

因此，您可以将功能调整为类似的功能：

def read_in_chunks(file_object, chunk_size_hint=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines(chunk_size_hint)
        if not data:
            break
        yield data

但这不能保证每块固定的线数。如果您在文档中进一步查看，您会找到以下建议：

请注意，使用for file中的line：... non note file.readlines（）。

这暗示着这样的事情

def read_in_chunks(file_object, chunk_size=10):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 10 lines"""
    data = []
    for n, line in enumerate(file_object, start=1):
        data.append(line)
        if not n % chunk_size:
            yield data
            data = []
    if data:
        yield data

可能更适合。

By default .readlines() reads the whole content of the stream into a list. But you can give it a byte size to produce lines in chunks:

Read and return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

So, you could adjust your function to something like:

def read_in_chunks(file_object, chunk_size_hint=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines(chunk_size_hint)
        if not data:
            break
        yield data

But that doesn't guarantee a fixed number of lines per chunk. If you look a bit further in the docs you'll find the following advice:

Note that it’s already possible to iterate on file objects using for line in file: ... without calling file.readlines().

That's a hint that something like this

def read_in_chunks(file_object, chunk_size=10):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 10 lines"""
    data = []
    for n, line in enumerate(file_object, start=1):
        data.append(line)
        if not n % chunk_size:
            yield data
            data = []
    if data:
        yield data

might be better suited.

回复收藏 0 原文

~没有更多了~