C++ 中的无锁共享内存对于可变长度记录

发布于 2025-01-02 00:05:56 字数 1390 浏览 1 评论 0原文

我是 IPC 新手。 Writer进程将数据写入共享内存,许多reader进程读取数据。要写入的数据具有唯一标识符,必须通过唯一键进行索引以加快访问速度(例如用于查找的 STL::map 或 hashmap)。数据也是可变长度记录(XML)(平均长度为200-250字节)。操作系统是 Intel Xeon 四核服务器上的Solaris 10 (i86pc)。

总数据量超过200G。但我们将只在共享内存中保留最新数据。历史数据驻留在文件中。共享内存大小约为4G~6G。

没有像 Boost::interprocess 这样的外部库可用

我有几个问题,可能有很多问题

  1. 哪个更有效:shared_memory 或 mmap(内存映射文件)
  2. 如何为可变长度记录构建索引。 [我不知道,可能是一些散列?]。
  3. 如果将 XML 转换为固定大小的结构(权衡 - 结构的大小将会很大,接近 300 多个可能的字段),这会很简洁吗
  4. ?我们可以通过提供自定义分配器将任何 STL 放置在共享内存中吗?
  5. 是否可以在没有信号量的情况下实现(使用 CAS 无锁实现)。

谢谢

这个怎么样。

|--------------------------|
| start_id   |   end_id    |   ->  range of msg id present in the segment
|--------------------------|
| id1 | start_mem | length |   ->
|--------------------------|   ->
| id2 | start_mem | length |   -> table of index for the actual data
|--------------------------|   ->
| id3 | start_mem | length |   ->
|--------------------------|   ->
| id4 | start_mem | length |   ->
|--------------------------|   ->
|                          |
|                          |
|                          |
|       data segment       |
|       varibale length    |
|       xml are stored     |
|                          |
|                          |
|--------------------------|

当新数据到达并且段已满时。最旧的数据以循环方式删除。可能需要删除 1 条以上的记录。

I am newbee to IPC. The Writer process writes the data into shared memory, Many reader processes reads the data. The data to be written have a unique identifier, has to be be indexed by unique key for the faster access( like STL::map or hashmap for lookup). Also data is a varible length record ( XML ) ( average lenght is 200-250 bytes ). OS is solaris 10 (i86pc) on Intel Xeon Quad Core Server.

Total data size is more than 200G. But we will be keeping only latest data in the shared memory. Historical data resides in file. shared memory size would be around 4G~6G.

No external library avaiable like Boost::interprocess

I have couple of questions, may be many

  1. Which is more efficient : shared_memory or mmap (Memory Mapped Files)
  2. How to build indexes for variable length record. [i have no idea, May be some hashing?].
  3. Would this be neat if XML is converted into fixed size structure ( Trade off - size of the structure will be huge, nearly 300+ possible fields )
  4. Can we place any STL in the shared_memory by providing custom allocator .?
  5. Is it possible to implement without semaphores (lockless implementation using CAS ).

Thanks

How about this.

|--------------------------|
| start_id   |   end_id    |   ->  range of msg id present in the segment
|--------------------------|
| id1 | start_mem | length |   ->
|--------------------------|   ->
| id2 | start_mem | length |   -> table of index for the actual data
|--------------------------|   ->
| id3 | start_mem | length |   ->
|--------------------------|   ->
| id4 | start_mem | length |   ->
|--------------------------|   ->
|                          |
|                          |
|                          |
|       data segment       |
|       varibale length    |
|       xml are stored     |
|                          |
|                          |
|--------------------------|

When the new data arrives and segment is full. oldest data is erased in a circular fashion. There can be a possibility of more than 1 record need to erased.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

生死何惧 2025-01-09 00:05:56

最简单的解决方案,如果您需要复杂的索引和其他类似的东西,您应该真正考虑面向服务的体系结构而不是共享内存。只需指定一个进程作为主缓存进程,并让它接受来自需要数据的其他进程的本地连接(通过 unix 域套接字、TCP 套接字或其他方式)。这让事情变得非常非常简单。

如果您不选择此路线,请注意共享内存很难。你所要求的在共享内存中肯定是可行的 - 你可以在这个 shmem 块中创建一个堆分配器,等等。STL 分配器可能工作,但不要指望任何第三方库会满意使用自定义指针类型的 STL 分配器。你需要锁(如果你聪明的话,你也许能够在某些情况下避免它们,但不是全部,在这种情况下肯定会和STL吻别),并且你将不得不重建所有需要的东西。你通常认为这是理所当然的。

再次,我强烈建议您从一个简单的缓存守护进程开始。大多数情况下,这种扩展效果很好,只是增加了一点延迟。

The easiest solution, if you require complex indexing and other such things, you should really consider a service-oriented architecture instead of shared memory. Simply designate one process to be your master cache process, and have it accept local connections (over unix domain sockets, or TCP sockets, or whatever) from other processes that need the data. This makes things much, much simpler.

If you don't choose this route, be warned that shared memory is hard. What you ask is definitely doable in shared memory - you can create a heap allocator in this shmem chunk, etc etc. STL allocators might work, but don't expect any third-party libraries to be happy with STL allocators using custom pointer types. You will need locks (if you're clever you might be able to avoid them in some cases, but not all, and in this case definitely kiss STL goodbye), and you will have to rebuild everything that you usually take for granted.

Again, I highly recommend you start with a simple cache daemon. This scales just fine most of the time, it just adds a bit more latency.

疑心病 2025-01-09 00:05:56

是否可以在没有信号量的情况下实现(使用 CAS 的无锁实现)。

由于无锁(lock-free)实现很难设计,并且最终可能会陷入混乱,因此在采用无锁解决方案之前,您应该考虑以下方面和替代方案:

  • 如果系统中有很多线程,那么调度程序可能会抢占持有锁的线程,因此所有其他线程将等待锁(如果不是,那么无锁不会带来显着的改进)。
  • 如果可以使用读写锁来解决这个问题。 (作家明显少于读者)。
  • 如果锁争用的可能性较小,那么您可以考虑自旋锁,这将为您提供与无锁相同的性能。

Is it possible to implement without semaphores (lockless implementation using CAS ).

Since lockless ( lock-free ) implementations are hard to design and we might end up in chaos, before going for lock-free solution you should consider following aspects and alternatives:

  • If there are many threads in system, so that scheduler is likely to preempt the thread which is holding the lock, consequently all other thread will be waiting for lock ( if not than lock-free is not going to give significant improvement ).
  • If this can be solvable using readers-writers lock. ( writers are significantly less than readers ).
  • If lock contention is less likely then you might consider spin-locks, which will give you as equal as lock-free performance.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文