Python mmap ctypes - 只读
我认为我遇到了此处所述的相反问题。我有一个进程将数据写入日志,我希望第二个进程读取它,但我不希望第二个进程能够修改内容。这可能是一个大文件,我需要随机访问,所以我使用 python 的 mmap 模块。
如果我将 mmap 创建为读/写(对于第二个进程),则使用 from_buffer 创建 ctypes 对象作为 mmap 对象的“视图”没有问题。粗略地看了一下c代码,看起来这是一个强制转换,而不是一个副本,这正是我想要的。但是,如果我将 mmap 设为 ACCESS_READ,则会中断,并抛出 from_buffer 需要写入权限的异常。
我认为我想改用 ctypes from_address() 方法,它似乎不需要写访问权限。我可能错过了一些简单的东西,但我不确定如何获取 mmap 中的位置地址。我知道我可以使用 ACCESS_COPY (因此写入操作显示在内存中,但不会持久保存到磁盘),但我宁愿将内容保留为只读。
有什么建议吗?
I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don't want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I'm using python's mmap module.
If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a "view" of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.
I think I want to use ctypes from_address() method instead, which doesn't appear to need write access. I'm probably missing something simple, but I'm not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren't persisted to disk), but I'd rather keep things read only.
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好吧,从查看 mmap .c 代码来看,我不相信它支持这个用例。另外,我发现对于我的用例来说,性能非常糟糕。我很好奇其他人会看到什么样的性能,但我发现用 Python 遍历一个 500 MB 的二进制文件大约需要 40 秒。这是创建一个 mmap,然后使用 from_buffer() 将位置转换为 ctype 对象,并使用 ctypes 对象来破译该对象的大小,以便我可以进入下一个对象。我尝试直接在 msvc 的 c++ 中做同样的事情。显然,在这里我可以直接转换为正确类型的对象,而且速度很快 - 不到一秒(这是使用 core 2 四核和 SSD)。
我确实发现我可以通过以下方式获取指针
这并不能解决最初的问题 - mmap 不是只读的,因为我仍然需要在第一次调用时使用 from_buffer 。在此配置中,处理整个文件仍然需要大约 40 秒,因此看起来从指针到 ctypes 结构的转换正在降低性能。这只是一个猜测,但我认为进一步追踪它没有多大价值。
我不确定我的计划是否对其他人有帮助,但我将尝试根据 mmap 代码创建特定于我的需求的 ac 模块。我想我可以使用快速 C 代码处理来索引二进制文件,然后通过调用 ctypes/python 对象一次仅公开文件的一小部分。祝我好运。
另外,作为旁注,Python 2.7.2 今天发布(2011 年 6 月 12 日),其中一项更改是对 mmap 代码的更新,以便您可以使用 python long 来设置文件偏移量。这使您可以在 32 位系统上对超过 4GB 的文件使用 mmap。请参阅问题 #4681 此处
Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).
I did find that I could get a pointer with the following
This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.
I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.
Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here
遇到了同样的问题,我们需要 from_buffer 接口并且想要只读访问权限。来自python文档 https://docs.python.org/3/library/mmap。 html “分配给 ACCESS_COPY 内存映射会影响内存,但不会更新底层文件。”
如果您可以接受使用匿名文件支持,则可以使用 ACCESS_COPY
示例:打开两个 cmd.exe 或终端,然后在一个终端中:
在另一个终端中执行:
您将看到,当第一个进程使用 ACCESS_COPY 写入
Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file."
If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY
An example: open two cmd.exe or terminals and in one terminal:
In the other terminal do:
And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY
我遇到了类似的问题(无法设置只读 mmap),但我只使用 python mmap 模块。 Linux 上的 Python mmap“权限被拒绝”
我不确定它是既然您不希望 mmap 是私有的,对您有什么帮助吗?
I ran into a similar issue (unable to setup a readonly mmap) but I was using only the python mmap module. Python mmap 'Permission denied' on Linux
I'm not sure it is of any help to you since you don't want the mmap to be private?