选择并打印文本文件的特定行
我有一个非常大(~8 GB)的文本文件,其中有很长的行。我想提取该文件选定范围内的行并将它们放入另一个文本文件中。事实上,我的问题与 this 和 这个但是当我尝试选择一系列行而不是单行时我总是陷入困境。
到目前为止,这是我开始工作的唯一方法:
lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))
但是,这给了我一个列表,我想输出一个格式与输入文件相同的文件(每行一行)
I have a very large (~8 gb) text file that has very long lines. I would like to pull out lines in selected ranges of this file and put them in another text file. In fact my question is very similar to this and this but I keep getting stuck when I try to select a range of lines instead of a single line.
So far this is the only approach I have gotten to work:
lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))
However this gives me a list and I would like to output a file with a format identical to the input file (one line per row)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以在范围上调用 join。
You can call join on the ranges.
我是否可以建议不要根据您的链接之一存储整个文件(因为它很大)?
我还建议使用“with”而不是打开和关闭文件,但不幸的是,我不允许在这里升级到足够新的 python 版本:(。
might i suggest not storing the entire file (since it is large) as per one of your links?
i'd also recommend using 'with' instead of opening and closing the file, but i unfortunately am not allowed to upgrade to a new enough version of python for that here : (.
遇到这样的问题时,您应该想到的第一件事是避免立即将整个文件读入内存。 readlines() 会做到这一点,因此应避免使用特定方法。
幸运的是,我们有一个优秀的 Python 标准库,
itertools
。
itertools
有很多有用的函数,其中之一是是切片
。 islice 迭代可迭代对象(例如列表、生成器、类文件对象等)并返回包含指定范围的生成器:使用此信息以及 str .join 方法,您可以使用以下简单代码提取第 10-19 行:
请注意,在循环文件对象时,换行符将从行中删除,因此您需要设置 \n 作为连接字符。
The first thing you should think of when facing a problem like this, is to avoid reading the entire file into memory at once.
readlines()
will do that, so that specific method should be avoided.Luckily, we have an excellent standard library in Python,
itertools
.itertools
has lot of useful functions, and one of them isislice
.islice
iterates over an iterable (such as lists, generators, file-like objects etc.) and returns a generator containing the range specified:Using this information, together with the str.join method, you can e.g. extract lines 10-19 by using this simple code:
Note that when looping over the file object, the newline char is stripped from the lines, so you need to set \n as the joining char.
(部分答案)为了使您当前的方法发挥作用,您必须逐行编写。例如:
(Partial Answer) In order to make your current approach work you'll have to write line by line. For instance:
打开 2 个文本文件。一个用于读取,一个用于写入
遍历输入文件的每一行
完成后关闭文件
Open 2 text files. One for reading and one for writing
go through each line of the input file
close the files when done