如何知道某个进程是否绑定到 Unix 域套接字?

发布于 2024-12-04 15:33:58 字数 1060 浏览 0 评论 0原文

我正在为 Linux 编写一个 Unix 域套接字服务器。

我很快发现 Unix 域套接字的一个特点是,虽然创建侦听 Unix 套接字会创建匹配的文件系统条目,但关闭套接字并不会删除它。此外,在手动删除文件系统条目之前,不可能再次将套接字 bind() 绑定到同一路径:bind() 失败并显示 EADDRINUSE 如果给定的路径已存在于文件系统中。

因此,套接字的文件系统条目需要在服务器关闭时进行 unlink() 处理,以避免在服务器重新启动时获取 EADDRINUSE。然而,这并不总是能够做到(即:服务器崩溃)。我发现的大多数常见问题解答、论坛帖子、问答网站仅建议作为解决方法,在调用 bind() 之前先 unlink() 套接字。然而,在这种情况下,需要在 unlink()'ing 之前知道进程是否绑定到此套接字。

事实上,在进程仍绑定到 Unix 套接字时unlink()'然后重新创建侦听套接字不会引发任何错误。然而,结果是旧的服务器进程仍在运行但无法访问:旧的监听套接字被新的监听套接字“屏蔽”。必须避免这种行为。

理想情况下,使用 Unix 域套接字,套接字 API 应该暴露与绑定 TCP 或 UDP 套接字时暴露的相同“互斥”行为:“我想将套接字 S 绑定到地址 A;如果进程已绑定到这个地址,只是抱怨!”不幸的是,情况并非如此......

有没有办法强制执行这种“互斥”行为?或者,给定一个文件系统路径,有没有办法通过套接字 API 知道系统上的任何进程是否有绑定到该路径的 Unix 域套接字?我应该使用套接字 API 外部的同步原语(flock(),...)吗?或者我错过了什么?

感谢您的建议。

注意:Linux 的抽象命名空间 Unix 套接字似乎解决了这个问题,因为没有 unlink() 的文件系统条目。然而,我正在编写的服务器的目标是通用的:它必须对两种类型的 Unix 域套接字都具有鲁棒性,因为我不负责选择侦听地址。

I'm writing a Unix domain socket server for Linux.

A peculiarity of Unix domain sockets I quickly found out is that, while creating a listening Unix socket creates the matching filesystem entry, closing the socket doesn't remove it. Moreover, until the filesystem entry is removed manually, it's not possible to bind() a socket to the same path again : bind() fails with EADDRINUSE if the path it is given already exists in the filesystem.

As a consequence, the socket's filesystem entry needs to be unlink()'ed on server shutdown to avoid getting EADDRINUSE on server restart. However, this cannot always be done (i.e.: server crash). Most FAQs, forum posts, Q&A websites I found only advise, as a workaround, to unlink() the socket prior to calling bind(). In this case however, it becomes desirable to know whether a process is bound to this socket before unlink()'ing it.

Indeed, unlink()'ing a Unix socket while a process is still bound to it and then re-creating the listening socket doesn't raise any error. As a result, however, the old server process is still running but unreachable : the old listening socket is "masked" by the new one. This behavior has to be avoided.

Ideally, using Unix domain sockets, the socket API should have exposed the same "mutual exclusion" behavior that is exposed when binding TCP or UDP sockets : "I want to bind socket S to address A; if a process is already bound to this address, just complain !" Unfortunately this is not the case...

Is there a way to enforce this "mutual exclusion" behavior ? Or, given a filesystem path, is there a way to know, via the socket API, whether any process on the system has a Unix domain socket bound to this path ? Should I use a synchronization primitive external to the socket API (flock(), ...) ? Or am I missing something ?

Thanks for your suggestions.

Note : Linux's abstract namespace Unix sockets seem to solve this issue, as there is no filesystem entry to unlink(). However, the server I'm writing aims to be generic : it must be robust against both types of Unix domain sockets, as I am not responsible for choosing listening addresses.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

白龙吟 2024-12-11 15:33:58

我知道我参加聚会已经很晚了,而且这个问题很久以前就得到了回答,但我只是在寻找其他东西时遇到了这个问题,我有一个替代提案。

当您遇到从 bind() 返回的 EADDRINUSE 时,您可以输入连接到套接字的错误检查例程。如果连接成功,则有一个正在运行的进程至少足够活跃以完成 accept()。我认为这是实现您想要实现的目标的最简单且最便携的方式。它的缺点是,最初创建 UDS 的服务器实际上可能仍在运行,但不知何故“卡住”并且无法执行 accept(),所以这个解决方案当然不是傻瓜 -证据,但我认为这是朝着正确方向迈出的一步。

如果 connect() 失败,则继续 unlink() 端点,然后再次尝试 bind()

I know I am very late to the party and that this was answered a long time ago but I just encountered this searching for something else and I have an alternate proposal.

When you encounter the EADDRINUSE return from bind() you can enter an error checking routine that connects to the socket. If the connection succeeds, there is a running process that is at least alive enough to have done the accept(). This strikes me as being the simplest and most portable way of achieving what you want to achieve. It has drawbacks in that the server that created the UDS in the first place may actually still be running but "stuck" somehow and unable to do an accept(), so this solution certainly isn't fool-proof, but it is a step in the right direction I think.

If the connect() fails then go ahead and unlink() the endpoint and try the bind() again.

落叶缤纷 2024-12-11 15:33:58

我认为除了你已经考虑过的事情之外,没有什么可做的。看来你研究得很好啊。

有多种方法可以确定套接字是否绑定到 unix 套接字(显然 lsof 和 netstat 可以做到),但它们非常复杂并且依赖于系统,因此我怀疑它们是否值得花精力来处理您提出的问题。

您实际上提出了两个问题 - 处理与其他应用程序的名称冲突以及处理您自己的应用程序的先前实例。

根据定义,您的 pgm 的多个实例不应尝试绑定到同一路径,因此这可能意味着您一次只希望运行一个实例。如果是这种情况,您可以使用标准 pid 文件锁定技术,这样两个实例就不会同时运行。如果无法获得锁定,则不应取消现有套接字的链接,甚至不应运行。这也可以解决服务器崩溃的情况。如果您可以获得锁,那么您就知道可以在绑定之前取消链接现有的套接字路径。

据我所知,您无法做太多事情来控制其他产生冲突的程序。文件权限并不完美,但如果您可以使用该选项,您可以将您的应用程序放入其自己的用户/组中。如果存在现有套接字路径并且您不拥有它,则不要取消链接并发出错误消息并让用户或系统管理员解决它。使用配置文件使其易于更改并可供客户端使用可能会起作用。除此之外,您几乎必须使用某种发现服务,这似乎是巨大的杀伤力,除非这是一个真正关键的应用程序。

总的来说,您可以放心,这种情况实际上并不经常发生。

I don't think there is much to be done beyond things you have already considered. You seem to have researched it well.

There are ways to determine if a socket is bound to a unix socket (obviously lsof and netstat do it) but they are complicated and system dependent enough that I question whether they are worth the effort to deal with the problems you raise.

You are really raising two problems - dealing with name collisions with other applications and dealing with previous instances of your own app.

By definition multiple instances of your pgm should not be trying to bind to the same path so that probably means you only want one instance to run at a time. If that's the case you can just use the standard pid filelock technique so two instances don't run simultaneously. You shouldn't be unlinking the existing socket or even running if you can't get the lock. This takes care of the server crash scenario as well. If you can get the lock then you know you can unlink the existing socket path before binding.

There is not much you can do AFAIK to control other programs creating collisions. File permissions aren't perfect, but if the option is available to you, you could put your app in its own user/group. If there is an existing socket path and you don't own it then don't unlink it and put out an error message and letting the user or sysadmin sort it out. Using a config file to make it easily changeable - and available to clients - might work. Beyond that you almost have to go some kind of discovery service, which seems like massive overkill unless this is a really critical application.

On the whole you can take some comfort that this doesn't actually happen often.

故事灯 2024-12-11 15:33:58

假设您只有一个打开该套接字的服务器程序。

那么这样呢:

  • 独占创建一个包含服务器进程的PID(也可能是socket的路径)的文件
    • 如果成功,则在其中写入您的 PID(和套接字路径)并继续创建套接字。
    • 如果失败,则套接字之前已创建(很可能),但服务器可能已死亡。因此,从存在的文件中读取 PID,然后检查这样的进程是否仍然存在(例如使用带有 0 信号的 kill):
      • 如果存在进程,它可能是服务器进程,也可能是不相关的进程
        • 此处可能需要更多步骤
      • 如果不存在此类进程,请删除该文件并开始尝试以独占方式创建它。
  • 每当进程终止时,请在关闭(并删除)套接字后删除文件。
  • 如果将套接字和锁定文件都放置在易失性文件系统中(在旧时代为 /tmp,在现代为 /run,则重新启动将清除旧套接字并锁定 。
  • 除非管理员喜欢使用 kill -9,否则您还可以建立一个信号处理程序,在收到致命信号时尝试删除锁定文件

Assuming you only have one server program that opens that socket.

Then what about this:

  • Exclusively create a file that contains the PID of the server process (maybe also the path of the socket)
    • If you succeed, then write your PID (and socket path) there and continue creating the socket.
    • If you fail, the socket was created before (most likely), but the server may be dead. Therefore read the PID from the file that exists, and then check that such a process still exists (e.g. using the kill with 0-signal):
      • If a process exists, it may be the server process, or it may be an unrelated process
        • (More steps may be needed here)
      • If no such process exists, remove the file and begin trying to create it exclusively.
  • Whenever the process terminates, remove the file after having closed (and removed) the socket.
  • If you place the socket and the lock file both in a volatile filesystem (/tmp in older ages, /run in modern times, then a reboot will clear old sockets and lock files automatically, most likely)
  • Unless administrators like to play with kill -9 you could also establish a signal handler that tries to remove the lock file when receiving fatal signals.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文