如何将文件逐行读取到列表中?
如何在 Python 中读取文件的每一行并将每一行存储为列表中的元素?
我想逐行读取文件并将每一行附加到列表的末尾。
How do I read every line of a file in Python and store each line as an element in a list?
I want to read the file line by line and append each line to the end of the list.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(28)
此代码会将整个文件读入内存并删除每行末尾的所有空白字符(换行符和空格):
如果您正在使用大文件,那么您应该逐行读取和处理它:
在 Python 3.8 及更高版本中,您可以将 while 循环与 walrus 运算符 像这样:
根据您计划对文件执行的操作及其编码方式,您可能还需要手动设置 访问模式和字符编码:
This code will read the entire file into memory and remove all whitespace characters (newlines and spaces) from the end of each line:
If you're working with a large file, then you should instead read and process it line-by-line:
In Python 3.8 and up you can use a while loop with the walrus operator like so:
Depending on what you plan to do with your file and how it was encoded, you may also want to manually set the access mode and character encoding:
请参阅输入和输出:
或剥离换行符:
See Input and Ouput:
or with stripping the newline character:
这比必要的更明确,但可以满足您的要求。
This is more explicit than necessary, but does what you want.
这将从文件中产生一个行“数组”。
open
返回一个可以迭代的文件。当您迭代文件时,您会从该文件中获取行。tuple
可以获取一个迭代器,并从您提供的迭代器中为您实例化一个元组实例。lines
是根据文件的行创建的元组。This will yield an "array" of lines from the file.
open
returns a file which can be iterated over. When you iterate over a file, you get the lines from that file.tuple
can take an iterator and instantiate a tuple instance for you from the iterator that you give it.lines
is a tuple created from the lines of the file.根据Python的文件对象的方法,将文本文件转换为
list
的最简单方法是:如果您只需要迭代文本文件行,您可以使用:
旧答案:
使用
with
和readlines()
:如果您不关心关闭文件,则这一行行将起作用:
传统方式:
According to Python's Methods of File Objects, the simplest way to convert a text file into
list
is:If you just need to iterate over the text file lines, you can use:
Old answer:
Using
with
andreadlines()
:If you don't care about closing the file, this one-liner will work:
The traditional way:
如果您希望包含
\n
:如果您不想包含
\n
:If you want the
\n
included:If you do not want
\n
included:您可以按照建议简单地执行以下操作:
请注意,这种方法有 2 个缺点:
1) 将所有行存储在内存中。在一般情况下,这是一个非常糟糕的主意。该文件可能非常大,并且您可能会耗尽内存。即使不大,也只是浪费内存。
2)这不允许在您阅读每一行时对其进行处理。因此,如果您在此之后处理线路,效率不高(需要两次而不是一次)。
对于一般情况,更好的方法如下:
您可以按照自己想要的方式定义流程函数。例如:(
Superman
类的实现留给您作为练习)。这对于任何文件大小都适用,并且您只需 1 遍即可浏览您的文件。这通常是通用解析器的工作方式。
You could simply do the following, as has been suggested:
Note that this approach has 2 downsides:
1) You store all the lines in memory. In the general case, this is a very bad idea. The file could be very large, and you could run out of memory. Even if it's not large, it is simply a waste of memory.
2) This does not allow processing of each line as you read them. So if you process your lines after this, it is not efficient (requires two passes rather than one).
A better approach for the general case would be the following:
Where you define your process function any way you want. For example:
(The implementation of the
Superman
class is left as an exercise for you).This will work nicely for any file size and you go through your file in just 1 pass. This is typically how generic parsers will work.
有一个文本文件内容:
我们可以在上面的txt的同一目录中使用这个Python脚本
使用附加:
或者:< /strong>
或:
或:
输出:
Having a Text file content:
We can use this Python script in the same directory of the txt above
Using append:
Or:
Or:
Or:
output:
Python 3.4 中引入的
pathlib
有一个非常方便的功能从文件中读取文本的方法,如下所示:(splitlines
调用将其从包含文件全部内容的字符串转换为文件中的行列表。)pathlib
有很多方便的地方。read_text
很好,简洁,您不必担心打开和关闭文件。如果您只需一次性读取该文件,那么这是一个不错的选择。Introduced in Python 3.4,
pathlib
has a really convenient method for reading in text from files, as follows:(The
splitlines
call is what turns it from a string containing the whole contents of the file to a list of lines in the file.)pathlib
has a lot of handy conveniences in it.read_text
is nice and concise, and you don't have to worry about opening and closing the file. If all you need to do with the file is read it all in in one go, it's a good choice.要将文件读入列表,您需要执行三件事:
幸运的是,Python 使执行这些操作变得非常容易,因此将文件读入列表的最短方法是:
但是我将添加更多解释。
打开文件
我假设您想要打开一个特定的文件,并且您不直接处理文件句柄(或类似文件的句柄)。在 Python 中打开文件最常用的函数是
open
< /a>,在 Python 2.7 中它需要一个强制参数和两个可选参数:文件名应该是一个表示文件路径的字符串 >。例如:
注意需要指定文件扩展名。这对于 Windows 用户尤其重要,因为在资源管理器中查看时,
.txt
或.doc
等文件扩展名默认情况下是隐藏的。第二个参数是
mode
,默认情况下是r
,表示“只读”。这正是您的情况所需要的。但如果您确实想创建文件和/或写入文件,则此处需要不同的参数。 如果您想了解概述,这里有一个很好的答案。
要读取文件,您可以省略
mode
或显式传递它:两者都会以只读模式打开文件。如果您想在 Windows 上读取二进制文件,则需要使用模式
rb
:在其他平台上,
'b'
(二进制模式)会被忽略。现在我已经展示了如何
打开
文件,让我们谈谈您总是需要再次关闭
它的事实。否则,它将保留文件的打开文件句柄,直到进程退出(或 Python 垃圾文件句柄)。虽然您可以使用:
当
open
和close
之间的某些内容引发异常时,将无法关闭文件。您可以通过使用try
和finally
来避免这种情况:但是 Python 提供了具有更漂亮语法的上下文管理器(但是对于
open
来说,它几乎与上面的try
和finally
):最后一种方法是在 Python 中推荐打开文件的方法!
读取文件
好的,您已经打开了文件,现在如何读取它?
open
函数返回一个file
对象,它支持Python的迭代协议。每次迭代都会给您一行:这将打印文件的每一行。但请注意,每一行末尾都会包含一个换行符
\n
(您可能需要检查您的 Python 是否是使用 通用换行符支持 - 否则您也可以在 Windows 上使用\r\n
或\r< /code> 在 Mac 上作为换行符)。如果您不想这样做,可以简单地删除最后一个字符(或 Windows 上的最后两个字符):
但最后一行不一定有尾随换行符,因此不应使用它。人们可以检查它是否以尾随换行符结尾,如果是,则将其删除:
但是您可以简单地从字符串末尾删除所有空格(包括
\n
字符) ,这也会删除所有其他尾随空格,因此如果这些空格很重要,您必须小心:但是,如果行以
\r\n
结尾(Windows“换行符”).rstrip()
也会处理\r
!将内容存储为列表
现在您已经知道如何打开文件并读取它,是时候将内容存储在列表中了。最简单的选择是使用
list
< /a> 函数:如果您想删除尾随换行符,您可以使用列表理解来代替:
或者更简单:
file
对象的.readlines()
方法默认返回一个list
行:这还将包括尾随换行符,如果您不需要它们,我会推荐
[line.rstrip() for line in f]
方法,因为它避免保留两个包含所有行的列表记忆中的线条。还有一个附加选项可以获取所需的输出,但它相当“次优”:
读取
字符串中的完整文件,然后按换行符拆分:或者:
这些会自动处理尾随换行符,因为不包含
split
字符。然而它们并不理想,因为您将文件保存为字符串和内存中的行列表!摘要
with open(...) as f
因为您不需要自己关闭文件,即使发生一些异常,它也会关闭文件。file
对象支持迭代协议,因此逐行读取文件就像for line in the_file_object:
一样简单。To read a file into a list you need to do three things:
Fortunately Python makes it very easy to do these things so the shortest way to read a file into a list is:
However I'll add some more explanation.
Opening the file
I assume that you want to open a specific file and you don't deal directly with a file-handle (or a file-like-handle). The most commonly used function to open a file in Python is
open
, it takes one mandatory argument and two optional ones in Python 2.7:The filename should be a string that represents the path to the file. For example:
Note that the file extension needs to be specified. This is especially important for Windows users because file extensions like
.txt
or.doc
, etc. are hidden by default when viewed in the explorer.The second argument is the
mode
, it'sr
by default which means "read-only". That's exactly what you need in your case.But in case you actually want to create a file and/or write to a file you'll need a different argument here. There is an excellent answer if you want an overview.
For reading a file you can omit the
mode
or pass it in explicitly:Both will open the file in read-only mode. In case you want to read in a binary file on Windows you need to use the mode
rb
:On other platforms the
'b'
(binary mode) is simply ignored.Now that I've shown how to
open
the file, let's talk about the fact that you always need toclose
it again. Otherwise it will keep an open file-handle to the file until the process exits (or Python garbages the file-handle).While you could use:
That will fail to close the file when something between
open
andclose
throws an exception. You could avoid that by using atry
andfinally
:However Python provides context managers that have a prettier syntax (but for
open
it's almost identical to thetry
andfinally
above):The last approach is the recommended approach to open a file in Python!
Reading the file
Okay, you've opened the file, now how to read it?
The
open
function returns afile
object and it supports Pythons iteration protocol. Each iteration will give you a line:This will print each line of the file. Note however that each line will contain a newline character
\n
at the end (you might want to check if your Python is built with universal newlines support - otherwise you could also have\r\n
on Windows or\r
on Mac as newlines). If you don't want that you can could simply remove the last character (or the last two characters on Windows):But the last line doesn't necessarily has a trailing newline, so one shouldn't use that. One could check if it ends with a trailing newline and if so remove it:
But you could simply remove all whitespaces (including the
\n
character) from the end of the string, this will also remove all other trailing whitespaces so you have to be careful if these are important:However if the lines end with
\r\n
(Windows "newlines") that.rstrip()
will also take care of the\r
!Store the contents as list
Now that you know how to open the file and read it, it's time to store the contents in a list. The simplest option would be to use the
list
function:In case you want to strip the trailing newlines you could use a list comprehension instead:
Or even simpler: The
.readlines()
method of thefile
object by default returns alist
of the lines:This will also include the trailing newline characters, if you don't want them I would recommend the
[line.rstrip() for line in f]
approach because it avoids keeping two lists containing all the lines in memory.There's an additional option to get the desired output, however it's rather "suboptimal":
read
the complete file in a string and then split on newlines:or:
These take care of the trailing newlines automatically because the
split
character isn't included. However they are not ideal because you keep the file as string and as a list of lines in memory!Summary
with open(...) as f
when opening files because you don't need to take care of closing the file yourself and it closes the file even if some exception happens.file
objects support the iteration protocol so reading a file line-by-line is as simple asfor line in the_file_object:
.readlines()
but if you want to process the lines before storing them in the list I would recommend a simple list-comprehension.将文件行读入列表的干净且 Python 的方式
首先,您应该专注于以高效且 Python 的方式打开文件并读取其内容。这是我个人不喜欢的方式的示例:
相反,我更喜欢以下打开文件进行读写的方法,因为它
非常干净,不需要关闭文件的额外步骤
一旦你使用完它。在下面的语句中,我们打开文件
用于读取,并将其分配给变量“infile”。一旦代码内
该语句运行完毕,文件将自动关闭。
现在我们需要集中精力将这些数据放入 Python 列表中,因为它们是可迭代的、高效且灵活的。在您的情况下,期望的目标是将文本文件的每一行放入一个单独的元素中。为了实现这一点,我们将使用 splitlines() 方法,如下所示:
最终产品:
测试我们的代码:
Clean and Pythonic Way of Reading the Lines of a File Into a List
First and foremost, you should focus on opening your file and reading its contents in an efficient and pythonic way. Here is an example of the way I personally DO NOT prefer:
Instead, I prefer the below method of opening files for both reading and writing as it
is very clean, and does not require an extra step of closing the file
once you are done using it. In the statement below, we're opening the file
for reading, and assigning it to the variable 'infile.' Once the code within
this statement has finished running, the file will be automatically closed.
Now we need to focus on bringing this data into a Python List because they are iterable, efficient, and flexible. In your case, the desired goal is to bring each line of the text file into a separate element. To accomplish this, we will use the splitlines() method as follows:
The Final Product:
Testing Our Code:
这是对文件使用列表理解的另一种选择;
这应该是更有效的方法,因为大部分工作是在 Python 解释器内完成的。
Here's one more option by using list comprehensions on files;
This should be more efficient way as the most of the work is done inside the Python interpreter.
现在变量 out 是您想要的列表(数组)。您可以这样做:
或者:
您会得到相同的结果。
Now variable out is a list (array) of what you want. You could either do:
Or:
You'll get the same results.
另一个选项是
numpy.genfromtxt
,例如:这将使
data
成为一个 NumPy 数组,其行数与文件中的行数相同。Another option is
numpy.genfromtxt
, for example:This will make
data
a NumPy array with as many rows as are in your file.使用Python 2和Python 3读写文本文件;它适用于 Unicode
需要注意的是:
with
是所谓的 上下文管理器。它确保打开的文件再次关闭。.strip()
或.rstrip()
的解决方案都将无法重现lines
,因为它们也会去除空白。常见文件结尾
.txt
更高级的文件写入/读取
对于您的应用程序,以下内容可能很重要:
另请参阅:数据序列化格式的比较
如果您想查看有关制作配置文件的方法,您可能需要阅读我的短文配置Python 中的文件。
Read and write text files with Python 2 and Python 3; it works with Unicode
Things to notice:
with
is a so-called context manager. It makes sure that the opened file is closed again..strip()
or.rstrip()
will fail to reproduce thelines
as they also strip the white space.Common file endings
.txt
More advanced file writing/reading
For your application, the following might be important:
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python.
如果您想从命令行或标准输入读取文件,您还可以使用
fileinput
模块:将文件传递给它,如下所示:
在此处阅读更多信息:http://docs.python.org/2/library/fileinput.html
If you'd like to read a file from the command line or from stdin, you can also use the
fileinput
module:Pass files to it like so:
Read more here: http://docs.python.org/2/library/fileinput.html
最简单的方法
一种简单的方法是:
在一行中,这将给出:
但是,这是非常低效的方法,因为这会在内存中存储内容的 2 个版本(对于小文件来说可能不是一个大问题,但仍然如此)。 [感谢马克·埃默里]。
有 2 种更简单的方法:
pathlib
为您的文件创建一个路径,您可以将其用于程序中的其他操作:The simplest way to do it
A simple way is to:
In one line, that would give:
However, this is quite inefficient way as this will store 2 versions of the content in memory (probably not a big issue for small files, but still). [Thanks Mark Amery].
There are 2 easier ways:
pathlib
to create a path for your file that you could use for other operations in your program:只需使用 splitlines() 函数即可。这是一个例子。
在输出中您将看到行列表。
Just use the splitlines() functions. Here is an example.
In the output you will have the list of lines.
如果您面临非常大/巨大的文件并且想要更快地阅读(想象一下您处于TopCoder 或 HackerRank 编码竞争),您可能会一次将相当大的行块读入内存缓冲区,而不是仅仅在文件级别逐行迭代。
If you are faced with a very large / huge file and want to read faster (imagine you are in a TopCoder or HackerRank coding competition), you might read a considerably bigger chunk of lines into a memory buffer at one time, rather than just iterate line by line at file level.
最简单的方法有一些额外的好处:
or
or
在使用
set
的情况下,我们必须记住,我们没有保留行顺序并删除重复的行。下面我添加了来自@MarkAmery的重要补充:
The easiest ways to do that with some additional benefits are:
or
or
In the case with
set
, we must be remembered that we don't have the line order preserved and get rid of the duplicated lines.Below I added an important supplement from @MarkAmery:
使用这个:
data
是一个数据帧类型,并使用值来获取ndarray。您还可以使用 array.tolist() 获取列表。Use this:
data
is a dataframe type, and uses values to get ndarray. You can also get a list by usingarray.tolist()
.如果文档中也有空行,我喜欢读取内容并将其传递给
filter
以防止空字符串元素In case that there are also empty lines in the document I like to read in the content and pass it through
filter
to prevent empty string elements概要和总结
使用
filename
,从Path(filename)
对象处理文件,或直接使用open(filename) as f
,执行以下操作:以下之一:list(fileinput.input(filename))
with path.open() as f
,调用f.readlines()
list(f)
path.read_text().splitlines()
path.read_text().splitlines(keepends=True)
f
和list.append
每行一次f
传递给绑定的list.extend
方法f
我在下面解释了每个方法的用例。
这是一个很好的问题。首先,让我们创建一些示例数据:
文件对象是惰性迭代器,因此只需迭代它即可。
或者,如果您有多个文件,请使用另一个惰性迭代器
fileinput.input
。对于只有一个文件:或者对于多个文件,向其传递一个文件名列表:
同样,上面的
f
和fileinput.input
都是/返回惰性迭代器。您只能使用迭代器一次,因此为了提供功能性代码,同时避免冗长,我将使用稍微简洁的
fileinput.input(filename)
,其中 apropos 来自此处。啊,但由于某种原因您希望将其放入列表中?如果可能的话我会避免这种情况。但如果你坚持...只需将
fileinput.input(filename)
的结果传递给list
:另一个直接的答案是调用
f.readlines
>,它返回文件的内容(最多可选的提示
个字符,因此您可以以这种方式将其分解为多个列表)。您可以通过两种方式获取该文件对象。一种方法是将文件名传递给
open
内置函数:或者使用
pathlib
模块中的新 Path 对象(我非常喜欢它,将从这里开始使用) on):list
还将使用文件迭代器并返回一个列表 - 这也是一个非常直接的方法:如果您不介意在分割之前将整个文本作为单个字符串读入内存,您可以可以使用
Path
对象和splitlines()
字符串方法将其作为单行代码来完成。默认情况下,splitlines
会删除换行符:如果要保留换行符,请传递
keepends=True
:鉴于我们已经通过多种方法轻松演示了最终结果,现在提出这个要求有点愚蠢。但您可能需要在创建列表时过滤或操作各行,所以让我们满足这个请求。
使用
list.append
可以让您在附加每一行之前对其进行过滤或操作:使用
list.extend
会更直接一些,如果您有一个预先存在的列表:或者更惯用的是,我们可以使用列表理解,如果需要的话,可以在其中进行映射和过滤:
或者更直接地,要关闭循环,只需将其传递给 list 即可直接创建一个新列表,而不需要对lines:
结论
您已经看到了将文件中的行放入列表的多种方法,但我建议您避免将大量数据具体化到列表中,而是尽可能使用 Python 的惰性迭代来处理数据。
也就是说,更喜欢
fileinput.input
或with path.open() as f
。Outline and Summary
With a
filename
, handling the file from aPath(filename)
object, or directly withopen(filename) as f
, do one of the following:list(fileinput.input(filename))
with path.open() as f
, callf.readlines()
list(f)
path.read_text().splitlines()
path.read_text().splitlines(keepends=True)
fileinput.input
orf
andlist.append
each line one at a timef
to a boundlist.extend
methodf
in a list comprehensionI explain the use-case for each below.
This is an excellent question. First, let's create some sample data:
File objects are lazy iterators, so just iterate over it.
Alternatively, if you have multiple files, use
fileinput.input
, another lazy iterator. With just one file:or for multiple files, pass it a list of filenames:
Again,
f
andfileinput.input
above both are/return lazy iterators.You can only use an iterator one time, so to provide functional code while avoiding verbosity I'll use the slightly more terse
fileinput.input(filename)
where apropos from here.Ah but you want it in a list for some reason? I'd avoid that if possible. But if you insist... just pass the result of
fileinput.input(filename)
tolist
:Another direct answer is to call
f.readlines
, which returns the contents of the file (up to an optionalhint
number of characters, so you could break this up into multiple lists that way).You can get to this file object two ways. One way is to pass the filename to the
open
builtin:or using the new Path object from the
pathlib
module (which I have become quite fond of, and will use from here on):list
will also consume the file iterator and return a list - a quite direct method as well:If you don't mind reading the entire text into memory as a single string before splitting it, you can do this as a one-liner with the
Path
object and thesplitlines()
string method. By default,splitlines
removes the newlines:If you want to keep the newlines, pass
keepends=True
:Now this is a bit silly to ask for, given that we've demonstrated the end result easily with several methods. But you might need to filter or operate on the lines as you make your list, so let's humor this request.
Using
list.append
would allow you to filter or operate on each line before you append it:Using
list.extend
would be a bit more direct, and perhaps useful if you have a preexisting list:Or more idiomatically, we could instead use a list comprehension, and map and filter inside it if desirable:
Or even more directly, to close the circle, just pass it to list to create a new list directly without operating on the lines:
Conclusion
You've seen many ways to get lines from a file into a list, but I'd recommend you avoid materializing large quantities of data into a list and instead use Python's lazy iteration to process the data if possible.
That is, prefer
fileinput.input
orwith path.open() as f
.我会尝试下面提到的方法之一。我使用的示例文件的名称为
dummy.txt
。您可以在此处找到该文件。我假设该文件与代码位于同一目录中(您可以更改fpath
以包含正确的文件名和文件夹路径)。在下面提到的两个示例中,您想要的列表由
lst
给出。1. 第一种方法
2. 在第二种方法中,可以使用csv.reader 来自 Python 标准库的模块:
您可以使用这两种方法中的任何一种。这两种方法创建
lst
所需的时间几乎相等。I would try one of the below mentioned methods. The example file that I use has the name
dummy.txt
. You can find the file here. I presume that the file is in the same directory as the code (you can changefpath
to include the proper file name and folder path).In both the below mentioned examples, the list that you want is given by
lst
.1. First method
2. In the second method, one can use csv.reader module from Python Standard Library:
You can use either of the two methods. The time taken for the creation of
lst
is almost equal for the two methods.我喜欢使用以下内容。立即阅读台词。
或者使用列表理解:
I like to use the following. Reading the lines immediately.
Or using list comprehension:
您还可以在 NumPy 中使用 loadtxt 命令。这比 genfromtxt 检查的条件更少,因此可能更快。
You could also use the loadtxt command in NumPy. This checks for fewer conditions than genfromtxt, so it may be faster.
这是一个 Python(3) 帮助器
library类,我用它来简化文件 I/O:然后您将使用
FileIO.lines
函数,如下所示:请记住
mode
(默认情况下为"r"
)和filter_fn
(默认情况下检查空行)参数是可选的。您甚至可以删除
read
、write
和delete
方法,只保留FileIO.lines
,甚至可以将将其放入名为read_lines
的单独方法中。Here is a Python(3) helper
libraryclass that I use to simplify file I/O:You would then use the
FileIO.lines
function, like this:Remember that the
mode
("r"
by default) andfilter_fn
(checks for empty lines by default) parameters are optional.You could even remove the
read
,write
anddelete
methods and just leave theFileIO.lines
, or even turn it into a separate method calledread_lines
.命令行版本
运行方式:
Command line version
Run with: