boost::interprocess::management_mapped_file 进程崩溃后无法访问
我使用 boost::interprocess::managed_mapped_file
创建持久的 boost::interprocess::deque
。
正常情况下运行顺利!
然而,我创建了一个压力测试,它将生成一个快速读取和写入内存映射的进程。在我的测试中,我在运行时自发地“杀死 -9”该进程以模拟不必要的断电。
仅经过几次尝试后,托管映射文件就变得无法访问和/或不负责任。可能是因为文件已损坏。
我遇到的症状是 boost::interprocess::management_mapped_file::check_sanity()
和 boost::interprocess::deque::push_back(...)
挂起并且永远不会返回。
我假设并可以接受文件因无法控制的外部情况而损坏,但是如何在调用挂起的 boost::interprocess::deque::push_back(...)
Rgds 克劳斯
I use boost::interprocess::managed_mapped_file
to create a persisted boost::interprocess::deque
.
Under normal circumstances it runs smoothly!
I have however created a stress test that will spawn a process that does rapid reads and writes to the memory map. In my test I then "kill -9" that process spontaneously while running to simulate unwanted power outages.
After only a few attempts the managed_mapped_file becomes inaccessible and/or unresponsible. Probably because the file has been corrupted.
The symptoms I experience is boost::interprocess::managed_mapped_file::check_sanity()
and boost::interprocess::deque::push_back(...)
hangs and never return.
I assume and can accept that the file is corrupted due to uncontrollable external circumstances, but how can I detect that the file is corrupt before calling the hanging boost::interprocess::deque::push_back(...)
Rgds
Klaus
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这被称为缺乏强大的锁定:
杀死任何使用持久媒体进行IO的进程,通过破坏事务,您将面临质量损坏的风险操作中途。
在您的情况下,当持有锁的进程终止时,锁陷入“持有”状态。根据您的操作系统,重新启动可能会有所帮助²
一些精心设计的协议/磁盘格式(通常是日志/日志结构数据库)可以检测到这一点,并通过将某些(部分)事务回滚到已知良好的状态来自动恢复。
简而言之,除非您不关心丢弃共享数据,否则不要执行
kill -9
。从大多数意义上来说,卡住的锁是您最不用担心的。如果您知道您面临的唯一损坏可能是同步原语的状态,您可以执行以下操作:
解决方法?
通常的做法是超时并强制重置共享资源。如果您使用单独的命名进程间互斥体,这会更容易管理(因为它可以防止您必须抛出整个托管段,而只需重新创建互斥体本身)。
我见过的这种解决方法的典型界面:
当然,这仍然是尼安德特人的风格。
我相信大多数 POSIX 环境都支持咨询文件锁(IIRC 也是 Boost Interprocess 的一部分)。这些锁将在进程终止时由内核释放。您可以使用它们来避免重新启动的需要。
此外,一种更简单的方法是不使用终止。您可以发送任何其他友好信号,并使用它来优雅地释放共享资源,从而从一开始就避免出现问题。
1(例如,您只有固定大小的数据结构,没有需要事务语义的更新)
2 我记得在 Boost Interprocess 中遍历大量平台相关代码,这些代码检测自同步原语的最后一个时间戳以来何时发生重新启动。
This is known as the lack of robust locking:
Killing any process that does IO with persistent media, you risk qualitative corruption, by breaking up a transactional operation midway.
In your case a lock got stuck in the "held" state, when the process holding it was terminated. Depending on your operating system, a reboot may help²
Some very carefully designed protocols/disk formats (usually journaling/log-structured databases) can detect this and automatically recover by rolling back some (partial) transactions to a known-good state.
In short, don't do
kill -9
unless you don't care about throwing away the shared data. The stuck lock, in most senses, is the least of your worries.If you know that the only corruption you are facing can be the state of synchronization primitives¹, you can do as follows:
Workaround?
The usual approach is to time out and forcefully reset the shared resources. This is somewhat easier to manage if you use a separate named interprocess mutex (because it prevents you from having to throw the entire managed segments, instead just recreating the mutex itself).
Typical interfaces for such workarounds in the wild I've seen:
Of course, this is all still pretty Neanderthal.
I believe that most POSIX environments support advisory file-locks (which IIRC are also part of Boost Interprocess). These locks will be released by the kernel on process termination. You might use them to avoid the need for a reboot.
Also, a far simpler approach is to not use termination. You can send any other friendly signal instead and use it to gracefully release the shared resources avoiding the problem in the first place.
¹ (e.g. you only have a fixed-size data-structure with no updates that require transactional semantics)
² I remember wading through lots of platform dependent code in Boost Interprocess that detected when e reboot has occurred since the last timestamp on a synchronization primitive.