当一个用户崩溃时共享内存中的互斥锁?

发布于 2024-08-10 11:24:26 字数 84 浏览 4 评论 0原文

假设一个进程正在共享内存中创建互斥体并锁定它,并在互斥体被锁定时转储核心。

现在,在另一个进程中,如何检测互斥锁已被锁定但不属于任何进程?

Suppose that a process is creating a mutex in shared memory and locking it and dumps core while the mutex is locked.

Now in another process how do I detect that mutex is already locked but not owned by any process?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

遗失的美好 2024-08-17 11:24:26

似乎确切的答案已经以健壮的互斥锁的形式提供了。

根据 POSIX,pthread 互斥体可以使用 pthread_mutexattr_setrobust() 进行“鲁棒”初始化。如果持有互斥锁的进程随后死亡,则获取该互斥锁的下一个线程将收到 EOWNERDEAD(但仍成功获取互斥锁),以便它知道执行任何清理。然后,它需要使用 pthread_mutex_concient() 通知获取的互斥体再次保持一致。

显然,您需要内核和 libc 支持才能实现此功能。在 Linux 上,这背后的内核支持称为“健壮的 futexes”,我发现对应用于 glibc HEAD 的用户空间更新的引用。

实际上,至少在 Linux 世界中,对此的支持似乎还没有被过滤掉。如果这些函数不可用,您可能会在那里找到 pthread_mutexattr_setrobust_np() ,据我所知,它似乎是提供相同语义的非 POSIX 前身。我在 Solaris 文档和 Debian 上的 /usr/include/pthread.h 中找到了对 pthread_mutexattr_setrobust_np() 的引用。

POSIX 规范可以在这里找到: http://www.opengroup.org/onlinepubs /9699919799/functions/pthread_mutexattr_setrobust.html

It seems that the exact answer has been provided in the form of robust mutexes.

According to POSIX, pthread mutexes can be initialised "robust" using pthread_mutexattr_setrobust(). If a process holding the mutex then dies, the next thread to acquire it will receive EOWNERDEAD (but still acquire the mutex successfully) so that it knows to perform any cleanup. It then needs to notify that the acquired mutex is again consistent using pthread_mutex_consistent().

Obviously you need both kernel and libc support for this to work. On Linux the kernel support behind this is called "robust futexes", and I've found references to userspace updates being applied to glibc HEAD.

In practice, support for this doesn't seem to have filtered down yet, in the Linux world at least. If these functions aren't available, you might find pthread_mutexattr_setrobust_np() there instead, which as far as I can gather appears to be a non-POSIX predecessor providing the same semantics. I've found references to pthread_mutexattr_setrobust_np() both in Solaris documentation and in /usr/include/pthread.h on Debian.

The POSIX spec can be found here: http://www.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_setrobust.html

榆西 2024-08-17 11:24:26

如果您在 Linux 或类似系统中工作,请考虑使用 命名信号量 而不是 (what我假设是)pthreads 互斥体。我认为没有办法确定 pthreads 互斥体的锁定 PID,除非构建自己的注册表并将其放入共享内存中。

If you're working in Linux or something similar, consider using named semaphores instead of (what I assume are) pthreads mutexes. I don't think there is a way to determine the locking PID of a pthreads mutex, short of building your own registration table and also putting it in shared memory.

森林散布 2024-08-17 11:24:26

基于文件的锁定(使用flock(2))怎么样?当持有它的进程终止时,它们会自动释放。

演示程序:

#include <stdio.h>
#include <time.h>
#include <sys/file.h>

void main() {
  FILE * f = fopen("testfile", "w+");

  printf("pid=%u time=%u Getting lock\n", getpid(), time(NULL));
  flock(fileno(f), LOCK_EX);
  printf("pid=%u time=%u Got lock\n", getpid(), time(NULL));

  sleep(5);
  printf("pid=%u time=%u Crashing\n", getpid(), time(NULL));
  *(int *)NULL = 1;
}

输出(为了清楚起见,我稍微截断了 PID 和时间):

$ ./a.out & sleep 2 ; ./a.out 
[1] 15
pid=15 time=137 Getting lock
pid=15 time=137 Got lock
pid=17 time=139 Getting lock
pid=15 time=142 Crashing
pid=17 time=142 Got lock
pid=17 time=147 Crashing
[1]+  Segmentation fault      ./a.out
Segmentation fault

发生的情况是第一个程序获取锁并开始休眠 5 秒。 2 秒后,程序的第二个实例启动,该实例在尝试获取锁时发生阻塞。 3 秒后,第一个程序出现段错误(bash 直到稍后才告诉您这一点),并且立即,第二个程序获得锁定并继续。

How about file-based locking (using flock(2))? These are automatically released when the process holding it dies.

Demo program:

#include <stdio.h>
#include <time.h>
#include <sys/file.h>

void main() {
  FILE * f = fopen("testfile", "w+");

  printf("pid=%u time=%u Getting lock\n", getpid(), time(NULL));
  flock(fileno(f), LOCK_EX);
  printf("pid=%u time=%u Got lock\n", getpid(), time(NULL));

  sleep(5);
  printf("pid=%u time=%u Crashing\n", getpid(), time(NULL));
  *(int *)NULL = 1;
}

Output (I've truncated the PIDs and times a bit for clarity):

$ ./a.out & sleep 2 ; ./a.out 
[1] 15
pid=15 time=137 Getting lock
pid=15 time=137 Got lock
pid=17 time=139 Getting lock
pid=15 time=142 Crashing
pid=17 time=142 Got lock
pid=17 time=147 Crashing
[1]+  Segmentation fault      ./a.out
Segmentation fault

What happens is that the first program acquires the lock and starts to sleep for 5 seconds. After 2 seconds, a second instance of the program is started which blocks while trying to acquire the lock. 3 seconds later, the first program segfaults (bash doesn't tell you this until later though) and immediately, the second program gets the lock and continues.

倾`听者〃 2024-08-17 11:24:26

只有当有人有相同的想法并且会发现这个讨论有用时,我才会保留这个错误的帖子而不被删除!


您可以使用这种方法。
1)锁定POSIX共享互斥体
2) 将进程id保存在共享内存中。
3)解锁共享互斥体
4) 正确退出时清理进程 ID

如果进程核心转储,下一个进程将发现在共享内存中保存了步骤 #2 中的进程 ID。如果操作系统中不存在具有此进程 ID 的进程,则没有人拥有共享互斥锁。所以只需更换process-id即可。

更新以回答评论:

场景 1:
1. P1开始
2. P1 创建/打开一个命名互斥体(如果它不存在)
3. P1 timed_locks 指定的互斥体并成功完成(如有必要,等待 10 秒);
4. P1 核心转储
5. P2在coredump之后启动
6. P2创建/打开一个命名互斥体,它存在,没问题
7. P2 timed_locks 指定的互斥体并且锁定失败(如果需要等待10秒);
8. P2删除指定的互斥体
9. P2重新创建一个命名的互斥体&锁定它

I left this WRONG post undeleted only if someone will have the same idea and will find this discussion of use!


You can use this approach.
1) Lock the POSIX shared mutex
2) Save the process-id in the shared memory.
3) Unlock the shared mutex
4) On correct exit clean the process-id

If the process coredumps the next process will find that in the shared memory there is a process-id saved on step #2. If there is no process with this process-id in the OS then no one owns the shared mutex. So it's just necessary to replace the process-id.

Update in order to answer the comment:

Scenario 1:
1. P1 starts
2. P1 creates/opens a named mutex if it doesn't exists
3. P1 timed_locks the named mutex and successfuly does it (waits for 10 secs if necessary);
4. P1 coredumps
5. P2 starts after the coredump
6. P2 creates/opens a named mutex, it exists, it's OK
7. P2 timed_locks the named mutex and fails to lock (waits for 10 secs if necessary);
8. P2 remove the named mutex
9. P2 recreates a named mutex & lock it

浮生面具三千个 2024-08-17 11:24:26

您应该使用操作系统提供的信号量。

操作系统会释放进程打开的所有资源,无论进程死亡还是正常退出。

You should use a semaphore as provided by the operating system.

The operating system releases all resources that a process has open whether it dies or exits gracefully.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文