将 mmap 与 popen 一起使用
我需要读入并处理一堆约 40mb 的 gzip 文本文件,并且需要快速完成并以最小的 I/O 开销(因为这些卷也被其他人使用)。因此,我找到的完成此任务的最快方法如下:
def gziplines(fname):
f = Popen(['zcat', fname], stdout=PIPE)
for line in f.stdout:
yield line
然后:
for line in gziplines(filename)
dostuff(line)
但是我想做的(如果这更快?)是这样的:
def gzipmmap(fname):
f = Popen(['zcat', fname], stdout=PIPE)
m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
return m
遗憾的是,当我尝试这个时,我收到此错误:
>>> m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
mmap.error: [Errno 19] No such device
尽管如此,当我尝试时:
>>> f.stdout.fileno()
4
所以,我认为我对这里发生的事情有一个基本的误解。 :(
这两个问题是:
1)这个 mmap 是将整个文件放入内存进行处理的更快方法吗?
2)我怎样才能做到这一点?
非常感谢...这里的每个人都已经提供了令人难以置信的帮助! ~尼克
I need to read in and process a bunch of ~40mb gzipped text files, and I need it done fast and with minimal i/o overhead (as the volumes are used by others as well). The fastest way I've found thus for this task looks like this:
def gziplines(fname):
f = Popen(['zcat', fname], stdout=PIPE)
for line in f.stdout:
yield line
and then:
for line in gziplines(filename)
dostuff(line)
but what I would like to do (IF this is faster?) is something like this:
def gzipmmap(fname):
f = Popen(['zcat', fname], stdout=PIPE)
m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
return m
sadly, when I try this, I get this error:
>>> m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
mmap.error: [Errno 19] No such device
even though, when I try:
>>> f.stdout.fileno()
4
So, I think I have a basic misunderstanding of what is going on here. :(
The two questions are:
1) Would this mmap be a faster method at putting the whole file into memory for processing?
2) How can I achieve this?
Thank you very much... everyone here has been incredibly helpful already!
~Nik
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
来自
mmap(2)
手册页:您不能映射流,只能映射真实文件或匿名交换空间。您需要自己将流读入内存。
From the
mmap(2)
man page:You cannot mmap streams, only real files or anonymous swap space. You will need to read from the stream into memory yourself.
管道不可映射。
并且pipe的文件操作不包含
mmap
钩子。Pipes aren't mmapable.
and pipe's file operations does not contain
mmap
hook.