将 mmap 与 popen 一起使用

发布于 2024-11-17 11:36:40 字数 1013 浏览 3 评论 0原文

我需要读入并处理一堆约 40mb 的 gzip 文本文件,并且需要快速完成并以最小的 I/O 开销(因为这些卷也被其他人使用)。因此,我找到的完成此任务的最快方法如下:

def gziplines(fname): 
    f = Popen(['zcat', fname], stdout=PIPE)
    for line in f.stdout:
        yield line

然后:

for line in gziplines(filename)
    dostuff(line)

但是我想做的(如果这更快?)是这样的:

def gzipmmap(fname): 
    f = Popen(['zcat', fname], stdout=PIPE)
    m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
    return m

遗憾的是,当我尝试这个时,我收到此错误:

>>> m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
mmap.error: [Errno 19] No such device

尽管如此,当我尝试时:

>>> f.stdout.fileno()
4

所以,我认为我对这里发生的事情有一个基本的误解。 :(

这两个问题是:

1)这个 mmap 是将整个文件放入内存进行处理的更快方法吗?

2)我怎样才能做到这一点?

非常感谢...这里的每个人都已经提供了令人难以置信的帮助! ~尼克

I need to read in and process a bunch of ~40mb gzipped text files, and I need it done fast and with minimal i/o overhead (as the volumes are used by others as well). The fastest way I've found thus for this task looks like this:

def gziplines(fname): 
    f = Popen(['zcat', fname], stdout=PIPE)
    for line in f.stdout:
        yield line

and then:

for line in gziplines(filename)
    dostuff(line)

but what I would like to do (IF this is faster?) is something like this:

def gzipmmap(fname): 
    f = Popen(['zcat', fname], stdout=PIPE)
    m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
    return m

sadly, when I try this, I get this error:

>>> m = mmap.mmap(f.stdout.fileno(), 0, access=mmap.ACCESS_READ)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
mmap.error: [Errno 19] No such device

even though, when I try:

>>> f.stdout.fileno()
4

So, I think I have a basic misunderstanding of what is going on here. :(

The two questions are:

1) Would this mmap be a faster method at putting the whole file into memory for processing?

2) How can I achieve this?

Thank you very much... everyone here has been incredibly helpful already!
~Nik

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

孤寂小茶 2024-11-24 11:36:40

来自 mmap(2) 手册页:

 ENODEV 指定文件的底层文件系统不支持
          端口内存映射。

您不能映射流,只能映射真实文件或匿名交换空间。您需要自己将流读入内存。

From the mmap(2) man page:

   ENODEV The  underlying  file system of the specified file does not sup-
          port memory mapping.

You cannot mmap streams, only real files or anonymous swap space. You will need to read from the stream into memory yourself.

虚拟世界 2024-11-24 11:36:40

管道不可映射。

case MAP_PRIVATE:
      ...
if (!file->f_op || !file->f_op->mmap)
        return -ENODEV;

并且pipe的文件操作不包含mmap钩子。

Pipes aren't mmapable.

case MAP_PRIVATE:
      ...
if (!file->f_op || !file->f_op->mmap)
        return -ENODEV;

and pipe's file operations does not contain mmap hook.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文