Python mmap ctypes - 只读

发布于 2024-11-14 20:17:37 字数 545 浏览 7 评论 0原文

我认为我遇到了此处所述的相反问题。我有一个进程将数据写入日志，我希望第二个进程读取它，但我不希望第二个进程能够修改内容。这可能是一个大文件，我需要随机访问，所以我使用 python 的 mmap 模块。

如果我将 mmap 创建为读/写（对于第二个进程），则使用 from_buffer 创建 ctypes 对象作为 mmap 对象的“视图”没有问题。粗略地看了一下c代码，看起来这是一个强制转换，而不是一个副本，这正是我想要的。但是，如果我将 mmap 设为 ACCESS_READ，则会中断，并抛出 from_buffer 需要写入权限的异常。

我认为我想改用 ctypes from_address() 方法，它似乎不需要写访问权限。我可能错过了一些简单的东西，但我不确定如何获取 mmap 中的位置地址。我知道我可以使用 ACCESS_COPY （因此写入操作显示在内存中，但不会持久保存到磁盘），但我宁愿将内容保留为只读。

有什么建议吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

贪了杯 2024-11-21 20:17:37

好吧，从查看 mmap .c 代码来看，我不相信它支持这个用例。另外，我发现对于我的用例来说，性能非常糟糕。我很好奇其他人会看到什么样的性能，但我发现用 Python 遍历一个 500 MB 的二进制文件大约需要 40 秒。这是创建一个 mmap，然后使用 from_buffer() 将位置转换为 ctype 对象，并使用 ctypes 对象来破译该对象的大小，以便我可以进入下一个对象。我尝试直接在 msvc 的 c++ 中做同样的事情。显然，在这里我可以直接转换为正确类型的对象，而且速度很快 - 不到一秒（这是使用 core 2 四核和 SSD）。

我确实发现我可以通过以下方式获取指针

firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object 
#at an arbitrary point in the file

这并不能解决最初的问题 - mmap 不是只读的，因为我仍然需要在第一次调用时使用 from_buffer 。在此配置中，处理整个文件仍然需要大约 40 秒，因此看起来从指针到 ctypes 结构的转换正在降低性能。这只是一个猜测，但我认为进一步追踪它没有多大价值。

我不确定我的计划是否对其他人有帮助，但我将尝试根据 mmap 代码创建特定于我的需求的 ac 模块。我想我可以使用快速 C 代码处理来索引二进制文件，然后通过调用 ctypes/python 对象一次仅公开文件的一小部分。祝我好运。

另外，作为旁注，Python 2.7.2 今天发布（2011 年 6 月 12 日），其中一项更改是对 mmap 代码的更新，以便您可以使用 python long 来设置文件偏移量。这使您可以在 32 位系统上对超过 4GB 的文件使用 mmap。请参阅问题 #4681 此处

Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).

I did find that I could get a pointer with the following

firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object 
#at an arbitrary point in the file

This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.

I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.

Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here

回复收藏 0 原文

夏夜暖风 2024-11-21 20:17:37

遇到了同样的问题，我们需要 from_buffer 接口并且想要只读访问权限。来自python文档 https://docs.python.org/3/library/mmap。 html “分配给 ACCESS_COPY 内存映射会影响内存，但不会更新底层文件。”
如果您可以接受使用匿名文件支持，则可以使用 ACCESS_COPY

示例：打开两个 cmd.exe 或终端，然后在一个终端中：

mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")

write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
    while True:
        value = int(input('enter an integer using mm_file_write: '))
        write.value = value
        print('updated value')
        value = int(input('enter an integer using mm_file_read: '))
        #read.value assignment doesnt update anonymous backed file
        read.value = value
        print('updated value')
except KeyboardInterrupt:
    print('got exit event')

在另一个终端中执行：

mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
    while True:
        new_i = struct.unpack('i', mm_file[:4])
        if i != new_i:
            print('i: {} => {}'.format(i, new_i))
            i = new_i
        time.sleep(0.1)
except KeyboardInterrupt:
    print('Stopped . . .')

您将看到，当第一个进程使用 ACCESS_COPY 写入

Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file."
If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY

An example: open two cmd.exe or terminals and in one terminal:

mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")

write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
    while True:
        value = int(input('enter an integer using mm_file_write: '))
        write.value = value
        print('updated value')
        value = int(input('enter an integer using mm_file_read: '))
        #read.value assignment doesnt update anonymous backed file
        read.value = value
        print('updated value')
except KeyboardInterrupt:
    print('got exit event')

In the other terminal do:

mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
    while True:
        new_i = struct.unpack('i', mm_file[:4])
        if i != new_i:
            print('i: {} => {}'.format(i, new_i))
            i = new_i
        time.sleep(0.1)
except KeyboardInterrupt:
    print('Stopped . . .')

And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY

回复收藏 0 原文