面对“内存错误”在进行多线程txt文件I/O时,寻找更好的解决方案
我只使用一个大约 4 MB 的 txt 文件,并且该文件需要频繁的 I/O,例如附加新行/搜索包含特定短语的某些行/用另一行替换某些行等
。文件“同时”,threading.RLock()
用于在资源运行时锁定资源。由于它不是一个大文件,我只需使用 readlines()
将它们全部读入列表并执行搜索工作,并使用 read()
读取整个文件文件转换为字符串 FileContent
,并使用 FileContent.replace("demo", "test")
将某些短语替换为我想要的任何内容。
但问题是,我偶尔会遇到“MemoryError”,我的意思是有时每 3 或 4 天一次,有时更长,例如一周左右。我仔细检查了我的代码,每个线程结束时没有未关闭的文件对象。至于文件操作,我只是使用:
CurrentFile = open("TestFile.txt", "r")
FileContent = CurrentFile.read()
CurrentFile.close()
我认为Python可能没有像我预期的那样快速删除无用的变量,最终导致内存不足,所以我正在考虑使用 with
语句,这可能是垃圾收集速度很快。我对这种说法没有经验,有人知道这是否有帮助?或者对于我的问题有更好的解决方案吗?
多谢。
补充:我的脚本会在短时间内进行大量替换,所以我的猜测是,如果 FileContent 不快速删除,可能数百个线程使用 FileContent = CurrentFile.read() 会导致内存不足?我该如何调试这样的问题?
I'm working with only one txt file which is about 4 MB, and the file needs frequently I/O such as append new lines/search for certain lines which includes specific phrases/replace certain line with another line etc.
In order to process the file "at the same time", threading.RLock()
is used to lock the resource when its under operation. As it's not a big file, I simply use readlines()
to read them all into a list and do the search job, and also use read()
to read the whole file into a string FileContent
, and use FileContent.replace("demo", "test")
to replace certain phrases with anything I want.
But the problem is, I'm occasionally facing "MemoryError", I mean sometimes every 3 or 4 days, sometimes longer like a week or so. I've checked my code carefully and there's no unclosed file object when each thread ends. As to file operation, I simply use:
CurrentFile = open("TestFile.txt", "r")
FileContent = CurrentFile.read()
CurrentFile.close()
I think maybe python is not deleting useless variables as fast as I expected which finally result into out of memory, so I'm considering to use with
statement which might be quick in garbage collecting. I'm not experienced with such statement, anybody knows if this would help? Or is there a better solution for my problem?
Thanks a lot.
Added: My script would do lots of replacement in a short period of time, so my guess is maybe hundreds of threads using FileContent = CurrentFile.read() would cause out of memory if FileContent not deleted quickly? How do I debug such problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果不查看更多代码,就不可能知道内存不足的原因。
with
语句是打开文件并在完成后关闭文件的首选方式:(抱歉,变量的大驼峰命名法对我来说看起来不太正确......)
坦率地说,我怀疑这会解决您的问题如果您确实按照问题中所示关闭文件,则会出现问题,但这仍然是很好的做法。
Without seeing more of your code, it's impossible to know why you are running out of memory. The
with
statement is the preferred way to open files and close them when done though:(sorry, UpperCamelCase for variables just doesn't look right to me...)
Frankly, I doubt this will solve your problem if you are really closing files as you show in the question, but it's still good practice.
听起来你正在泄漏内存。 Python 将在给出 MemoryError 之前使用所有可用的系统内存,4 MB 听起来并不算多。泄漏内存的位置取决于您在问题中未给出的代码。
您是否观察过操作系统任务管理器中的内存使用情况?
这是一个调试Python内存使用情况的工具(需要Python调试编译):
http://guppy-pe。 sourceforge.net/#Heapy
使用它来分析您的代码内存使用情况并查看您正在创建的哪些对象没有被释放。
Sounds like you are leaking memory. Python will use all available system memory before giving MemoryError and 4 MB does not sound much. Where you leak memory depends on your code which you didn't give in your question.
Have you watched the memory usage in the task manage of the OS?
Here is a tool to debug Python memory usage (needs Python debug compiliation):
http://guppy-pe.sourceforge.net/#Heapy
Use it to analyze your code memory usage and see what objects you are creating which don't get freed.