为什么bash的flock在获取锁失败时不会超时退出？

发布于 2024-09-02 07:26:29 字数 934 浏览 7 评论 0原文

我正在使用 flock，这是一个用于文件锁定的 bash 命令，以防止代码的两个不同实例多次运行。

我正在使用这个测试代码：

( ( flock -x 200 ;  sleep 10 ; echo "original finished" ; ) 200>./test.lock ) &
( sleep 2 ; ( flock -x -w 2 200 ; echo "a finished" ) 200>./test.lock ) &

我正在运行 2 个子 shell（后台）。 (flock NUM; ...) NUM>FILE 语法来自 flock 的手册页。

我预计第一个子 shell 将获得 test.lock 上的独占锁，然后等待 10 秒，然后打印“original finish”，始终持有锁。第二个子 shell 会或多或少同时启动，等待 2 秒，然后尝试获取 test.lock 上的锁，但 2 秒后超时。如果它获得了锁，那么它会打印“a finish”。如果它没有获得锁定，该子 shell 应该停止，并且不应该打印任何内容。

由于第一个子 shell 等待的时间较长，因此它将保持锁定 10 秒，因此第二个子 shell 不应获得锁定，也不应完成。即，人们应该看到“原始完成”打印，并且不两者都打印。

实际发生的情况是打印“完成”，然后打印“原始完成”。

这意味着第二个子 shell 要么 (a) 不使用与第一个子 shell 相同的锁，要么 (b) 它无法获取锁，但继续执行，要么 (c) 其他情况。

为什么这些锁不能按我的预期工作？

原文

I am playing with using flock, a bash command for file locks to prevent two different instances of the code from running more than once.

I am using this testing code:

( ( flock -x 200 ;  sleep 10 ; echo "original finished" ; ) 200>./test.lock ) &
( sleep 2 ; ( flock -x -w 2 200 ; echo "a finished" ) 200>./test.lock ) &

I am running 2 subshells (backgrounded). The (flock NUM; ...) NUM>FILE syntax is from flock's man page.

I expect that the first subshell will get an exclusive lock on test.lock, then wait 10 seconds, then print "original finished", all the time holding the lock. The second subshell will start at more or less the same time, wait 2 seconds, then try to get a lock on test.lock, but timeout after 2 seconds. If it gets a lock, then it'll print "a finished". If it doesn't get the lock, that subshell should stop, and nothing should be printed.

Since the first subshell is waiting longer, it will keep the lock for 10 seconds, so the second subshell should not get the lock, and shouldn't finish. i.e. one should see "original finished" printed and not both.

What actually happens is that "a finished" is printed, then "original finished" is printed.

This implies that that the second subshell is either (a) not using the same lock as the first subshell or (b) that it fails to get the lock, but continues to execute or (c) something else.

Why don't those locks work as I expect?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无名指的心愿 2024-09-09 07:26:29

问题是，如果flock进程未能在超时时间内获得锁，它无法杀死父进程（即产生它的shell） - 它所能做的就是返回一个失败返回码。您需要在继续之前检查返回代码：

flock <params> && <do other stuff>

如此。

( ( flock -x 200 ;  sleep 10 ; echo "original finished" ; ) 200>./test.lock ) & ( sleep 2 ; ( flock -x -w 2 200 && echo "a finished" ) 200>./test.lock ) &

您想要的也是

The issue is that, if the flock process fails to get the lock within the timeout, it has no way of killing the parent process (i.e. the shell that spawned it) - all it can do is return a failure return code. You need to check that return code before continuing:

flock <params> && <do other stuff>

( ( flock -x 200 ;  sleep 10 ; echo "original finished" ; ) 200>./test.lock ) & ( sleep 2 ; ( flock -x -w 2 200 && echo "a finished" ) 200>./test.lock ) &

does what you want.

回复收藏 0 原文

~没有更多了~