ANSI C 哈希表实现,数据位于一个内存块中
我正在寻找一种哈希表的开源 C 实现,它将所有数据保存在一个内存块中,因此可以轻松地通过网络发送数据。 我只能找到为添加到其中的每个键值对分配小块内存的内存。
预先非常感谢您的所有投入。
编辑:它不一定需要是哈希表,无论键值对表可能会做什么。
I am looking for an open source C implementation of a hash table that keeps all the data in one memory block, so it can be easily send over a network let say.
I can only find ones that allocate small pieces of memory for every key-value pair added to it.
Thank you very much in advance for all the inputs.
EDIT: It doesn't necessarily need to be a hash table, whatever key-value pair table would probably do.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
序列化此类数据结构的次数(通过网络发送也是序列化)与使用此类数据结构(在程序中)的次数相当低。因此,大多数实现更多地关注速度而不是“可能更容易序列化”方面。
如果所有数据都在一个分配的内存块中,那么对该数据结构的大量操作将有点昂贵,因为您必须:
大多数网络操作都会被缓冲,只需迭代键并发送键+值即可。
The number of times you would serialize such data structure (and sending over network is serializing as well) vs the number of times you would use such data structure (in your program) is pretty low. So, most implementations focus more on the speed instead of the "maybe easier to serialize" side.
If all the data would be in one allocated memory block a lot of operations on that data structure would be a bit expensive because you would have to:
Most network operations are buffered anyway, just iterate over the keys and send keys + values.
在unix系统上,我可能会使用共享内存缓冲区(请参阅
shm_open()
),或者如果没有带有 MAP_SHARED 标志的内存映射文件,请通过 http://en.wikipedia.org /wiki/Mmap如果
shm_open
和mmap
都不可用,您仍然可以使用磁盘上的文件(在某种程度上),您可以为了关心正确的锁定,我会向下一个进程发送解锁信号,并且可能会查找文件的更新部分,然后该进程再次锁定文件,查找感兴趣的部分并照常进行(更新/删除/等)。在任何情况下,您都可以自由设计哈希表的布局或任何您想要的布局,例如具有固定宽度的键/查找对。这样您就可以快速访问哈希表的键,并且如果需要,您可以查找数据部分,然后复制/删除/修改/等等。
当然,理想情况下该文件应该位于 RAM 磁盘上。
On a unix system I'd probably utilise a shared memory buffer (see
shm_open()
), or if that's not available a memory-mapped file with the MAP_SHARED flag, see the OS-specific differences though http://en.wikipedia.org/wiki/MmapIf both
shm_open
andmmap
aren't available you could still use a file on the disk (to some extent), you'd have to care about the proper locking, I'd send an unlock signal to the next process and maybe the seek of the updated portion of the file, then that process locks the file again, seeks to the interesting part and proceeds as usual (updates/deletes/etc.).In any case, you could freely design the layout of the hashtable or whatever you want, like having fixed width key/seek pairs. That way you'd have the fast access to the keys of your hashtable and if necessary you seek to the data portion, then copy/delete/modify/etc.
Ideally this file should be on a ram disk, of course.
我完全同意阿基拉(+1)。关于数据局部性的另一条评论。一旦表变大,或者卫星数据足够大,肯定会存在缓存污染,这会额外减慢表上的任何操作,或者换句话说,您可以依靠 1/2/3 级缓存链来服务当您必须访问卫星数据(例如用于序列化)时,可以及时获取关键数据,同时忍受缓存未命中的情况。
I agree completely with akira (+1). Just one more comment on data locality. Once the table gets larger, or if the satellite data is large enough, there's most certainly cache pollution which slows down any operation on the table additionally, or in other words you can rely on the level-1/2/3 cache chain to serve the key data promptly whilst putting up with a cache miss when you have to access the satellite data (e.g. for serialisation).
提供哈希表的库倾向于隐藏细节并使事情高效工作(这通常是程序员使用哈希表时想要的),因此通常他们处理内存的方式对最终程序员来说是隐藏的,程序员不应该依赖关于特定的“内存布局”,可能会在以下版本的库中发生变化。
编写您自己的函数,以最方便您使用的方式序列化(和反序列化)哈希表。如果多次需要,可以保留序列化内容(当然,当哈希表更改时,需要更新内存中保存的序列化“版本”)。
Libraries providing hashtables tend to hide the details and make the thing work efficiently (that is normally what programmers want when they use an hashtabe), so normally the way they handle the memory is hidden from the final programmer's eyes, and programmers shouldn't rely on the particular "memory layout", that may change in following version of the library.
Write your own function to serialize (and unserialize) the hashtable in the most convenient way for your usage. You can keep the serialized content if you need it several times (of course, when the hashtable is changed, you need to update the serialized "version" kept in memory).