读取整个文件是否会使文件句柄保持打开状态?
如果您使用 content = open('Path/to/file', 'r').read()
读取整个文件,文件句柄是否会保持打开状态直到脚本退出?有没有更简洁的方法来读取整个文件?
If you read an entire file with content = open('Path/to/file', 'r').read()
is the file handle left open until the script exits? Is there a more concise method to read a whole file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
该问题的答案在某种程度上取决于特定的 Python 实现。
要了解这是什么意思,请特别注意实际的
file
对象。在您的代码中,该对象仅在表达式中提及一次,并且在read()
调用返回后立即变得不可访问。这意味着文件对象是垃圾。唯一剩下的问题是“垃圾收集器何时收集文件对象?”。
在使用引用计数器的CPython中,这种垃圾会立即被注意到,因此会立即被收集。对于其他 Python 实现来说,情况通常并非如此。
为了确保文件关闭,更好的解决方案是这种模式:
它总是在块结束后立即关闭文件;即使出现异常。
编辑:要说得更详细一点:
除了
file.__exit__()
之外,它是在with
上下文管理器设置中“自动”调用的,这是唯一的其他方式file.close()
是通过file.__del__()
自动调用的(也就是说,除了您自己显式调用它之外)。这给我们带来了一个问题:__del__()
何时被调用?-- https://devblogs.microsoft.com/oldnewthing/20100809-00/ ?p=13203
特别是:
-- https://docs.python.org/3.5 /reference/datamodel.html#objects-values-and-types
(强调我的)
但正如它所暗示的,其他实现可能有其他行为。例如,PyPy 有 6 种不同的垃圾回收实现!
The answer to that question depends somewhat on the particular Python implementation.
To understand what this is all about, pay particular attention to the actual
file
object. In your code, that object is mentioned only once, in an expression, and becomes inaccessible immediately after theread()
call returns.This means that the file object is garbage. The only remaining question is "When will the garbage collector collect the file object?".
in CPython, which uses a reference counter, this kind of garbage is noticed immediately, and so it will be collected immediately. This is not generally true of other python implementations.
A better solution, to make sure that the file is closed, is this pattern:
which will always close the file immediately after the block ends; even if an exception occurs.
Edit: To put a finer point on it:
Other than
file.__exit__()
, which is "automatically" called in awith
context manager setting, the only other way thatfile.close()
is automatically called (that is, other than explicitly calling it yourself,) is viafile.__del__()
. This leads us to the question of when does__del__()
get called?-- https://devblogs.microsoft.com/oldnewthing/20100809-00/?p=13203
In particular:
-- https://docs.python.org/3.5/reference/datamodel.html#objects-values-and-types
(Emphasis mine)
but as it suggests, other implementations may have other behavior. As an example, PyPy has 6 different garbage collection implementations!
您可以使用 pathlib。
对于 Python 3.5 及更高版本:
对于旧版本的 Python,请使用 pathlib2:
然后:
这是实际的 <代码>read_text 实现:
You can use pathlib.
For Python 3.5 and above:
For older versions of Python use pathlib2:
Then:
This is the actual
read_text
implementation:好吧,如果您必须逐行读取文件才能处理每一行,您可以使用
或什至更好的方法:
Well, if you have to read file line by line to work with each line, you can use
Or even better way:
不是将文件内容作为单个字符串检索,
将内容存储为文件包含的所有行的列表可以很方便:
可以看出,需要添加串联方法
.strip().split("\n ”)
到 主要答案线程。这里,
.strip()
只是删除整个文件字符串末尾的空格和换行符,.split("\n")
通过在每个换行符 \n 处拆分整个文件字符串来生成实际列表。而且,
这样,整个文件内容可以存储在变量中,这在某些情况下可能是需要的,而不是像 之前的答案。
Instead of retrieving the file content as a single string,
it can be handy to store the content as a list of all lines the file comprises:
As can be seen, one needs to add the concatenated methods
.strip().split("\n")
to the main answer in this thread.Here,
.strip()
just removes whitespace and newline characters at the endings of the entire file string,and
.split("\n")
produces the actual list via splitting the entire file string at every newline character \n.Moreover,
this way the entire file content can be stored in a variable, which might be desired in some cases, instead of looping over the file line by line as pointed out in this previous answer.