选择并打印文本文件的特定行

发布于 2024-09-16 10:17:23 字数 489 浏览 7 评论 0原文

我有一个非常大(~8 GB)的文本文件,其中有很长的行。我想提取该文件选定范围内的行并将它们放入另一个文本文件中。事实上,我的问题与 this这个但是当我尝试选择一系列行而不是单行时我总是陷入困境。

到目前为止,这是我开始工作的唯一方法:

lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))

但是,这给了我一个列表,我想输出一个格式与输入文件相同的文件(每行一行)

I have a very large (~8 gb) text file that has very long lines. I would like to pull out lines in selected ranges of this file and put them in another text file. In fact my question is very similar to this and this but I keep getting stuck when I try to select a range of lines instead of a single line.

So far this is the only approach I have gotten to work:

lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))

However this gives me a list and I would like to output a file with a format identical to the input file (one line per row)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

乞讨 2024-09-23 10:17:23

您可以在范围上调用 join。

lines = readin.readlines()
out1.write(''.join(lines[5:67]))
out2.write(''.join(lines[89:111]))

You can call join on the ranges.

lines = readin.readlines()
out1.write(''.join(lines[5:67]))
out2.write(''.join(lines[89:111]))
十秒萌定你 2024-09-23 10:17:23

我是否可以建议不要根据您的链接之一存储整个文件(因为它很大)?

f = open('file')
n = open('newfile', 'w')
for i, text in enumerate(f):
    if i > 4 and i < 68:
        n.write(text)
    elif i > 88 and i < 112:
        n.write(text)
    else:
        pass

我还建议使用“with”而不是打开和关闭文件,但不幸的是,我不允许在这里升级到足够新的 python 版本:(。

might i suggest not storing the entire file (since it is large) as per one of your links?

f = open('file')
n = open('newfile', 'w')
for i, text in enumerate(f):
    if i > 4 and i < 68:
        n.write(text)
    elif i > 88 and i < 112:
        n.write(text)
    else:
        pass

i'd also recommend using 'with' instead of opening and closing the file, but i unfortunately am not allowed to upgrade to a new enough version of python for that here : (.

网名女生简单气质 2024-09-23 10:17:23

遇到这样的问题时,您应该想到的第一件事是避免立即将整个文件读入内存。 readlines() 会做到这一点,因此应避免使用特定方法。

幸运的是,我们有一个优秀的 Python 标准库,itertoolsitertools 有很多有用的函数,其中之一是 是切片。 islice 迭代可迭代对象(例如列表、生成器、类文件对象等)并返回包含指定范围的生成器:

itertools.islice(iterable, start, stop[, step])

<块引用>

创建一个迭代器,返回从可迭代对象中选择的元素。如果 start 非零,
然后跳过可迭代对象中的元素,直到到达开始位置。
之后,除非设置了step,否则将连续返回元素
高于 1 会导致项目被跳过。如果停止为无,
然后继续迭代,直到迭代器耗尽(如果有的话);
否则,停止在指定位置。与普通切片不同的是,
islice() 不支持开始、停止或步长为负值。
可用于从内部数据中提取相关字段
结构已扁平化(例如,多行报告可能
每三行列出一个名称字段)

使用此信息以及 str .join 方法,您可以使用以下简单代码提取第 10-19 行:

from itertools import islice

# Add the 'wb' flag if you use Windows
with open('huge_data_file.txt', 'wb') as data_file: 
    txt = '\n'.join(islice(data_file, 10, 20))

请注意,在循环文件对象时,换行符将从行中删除,因此您需要设置 \n 作为连接字符。

The first thing you should think of when facing a problem like this, is to avoid reading the entire file into memory at once. readlines() will do that, so that specific method should be avoided.

Luckily, we have an excellent standard library in Python, itertools. itertools has lot of useful functions, and one of them is islice. islice iterates over an iterable (such as lists, generators, file-like objects etc.) and returns a generator containing the range specified:

itertools.islice(iterable, start, stop[, step])

Make an iterator that returns selected elements from the iterable. If start is non-zero,
then elements from the iterable are skipped until start is reached.
Afterward, elements are returned consecutively unless step is set
higher than one which results in items being skipped. If stop is None,
then iteration continues until the iterator is exhausted, if at all;
otherwise, it stops at the specified position. Unlike regular slicing,
islice() does not support negative values for start, stop, or step.
Can be used to extract related fields from data where the internal
structure has been flattened (for example, a multi-line report may
list a name field on every third line)

Using this information, together with the str.join method, you can e.g. extract lines 10-19 by using this simple code:

from itertools import islice

# Add the 'wb' flag if you use Windows
with open('huge_data_file.txt', 'wb') as data_file: 
    txt = '\n'.join(islice(data_file, 10, 20))

Note that when looping over the file object, the newline char is stripped from the lines, so you need to set \n as the joining char.

好久不见√ 2024-09-23 10:17:23

(部分答案)为了使您当前的方法发挥作用,您必须逐行编写。例如:

lines = readin.readlines()

for each in lines[5:67]:
    out1.write(each)

for each in lines[89:111]:
    out2.write(each)

(Partial Answer) In order to make your current approach work you'll have to write line by line. For instance:

lines = readin.readlines()

for each in lines[5:67]:
    out1.write(each)

for each in lines[89:111]:
    out2.write(each)
计㈡愣 2024-09-23 10:17:23
path = "c:\\someplace\\"

打开 2 个文本文件。一个用于读取,一个用于写入

f_in = open(path + "temp.txt", 'r')
f_out = open(path + output_name, 'w')

遍历输入文件的每一行

for line in f_in:
    if i_want_to_write_this_line == True:
        f_out.write(line)

完成后关闭文件

f_in.close()
f_out.close()
path = "c:\\someplace\\"

Open 2 text files. One for reading and one for writing

f_in = open(path + "temp.txt", 'r')
f_out = open(path + output_name, 'w')

go through each line of the input file

for line in f_in:
    if i_want_to_write_this_line == True:
        f_out.write(line)

close the files when done

f_in.close()
f_out.close()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文