Go exec.CommandContext 在上下文超时后不会被终止

发布于 2025-01-18 09:39:57 字数 1025 浏览 2 评论 0原文

在golang中,我通常可以将context.WithTimeout()exec.CommandContext()结合使用来让命令在超时后自动被终止(使用SIGKILL)。

但我遇到了一个奇怪的问题,如果我用 sh -c 包装命令 AND 通过设置 cmd.Stdout = &bytes 缓冲命令的输出.Buffer{},超时不再起作用,命令将永远运行。

为什么会出现这种情况?

这是一个最小的可重现示例:

package main

import (
    "bytes"
    "context"
    "os/exec"
    "time"
)

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
    defer cancel()

    cmdArgs := []string{"sh", "-c", "sleep infinity"}
    bufferOutputs := true

    // Uncommenting *either* of the next two lines will make the issue go away:

    // cmdArgs = []string{"sleep", "infinity"}
    // bufferOutputs = false

    cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
    if bufferOutputs {
        cmd.Stdout = &bytes.Buffer{}
    }
    _ = cmd.Run()
}

我用 Linux 标记了这个问题,因为我只验证了这种情况发生在 Ubuntu 20.04 上,并且我不确定它是否会在其他平台上重现。

In golang, I can usually use context.WithTimeout() in combination with exec.CommandContext() to get a command to automatically be killed (with SIGKILL) after the timeout.

But I'm running into a strange issue that if I wrap the command with sh -c AND buffer the command's outputs by setting cmd.Stdout = &bytes.Buffer{}, the timeout no longer works, and the command runs forever.

Why does this happen?

Here is a minimal reproducible example:

package main

import (
    "bytes"
    "context"
    "os/exec"
    "time"
)

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
    defer cancel()

    cmdArgs := []string{"sh", "-c", "sleep infinity"}
    bufferOutputs := true

    // Uncommenting *either* of the next two lines will make the issue go away:

    // cmdArgs = []string{"sleep", "infinity"}
    // bufferOutputs = false

    cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
    if bufferOutputs {
        cmd.Stdout = &bytes.Buffer{}
    }
    _ = cmd.Run()
}

I've tagged this question with Linux because I've only verified that this happens on Ubuntu 20.04 and I'm not sure whether it would reproduce on other platforms.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蝶舞 2025-01-25 09:39:57

我的问题是,当上下文超时时,子睡眠进程没有被终止。 sh 父进程被终止,但子进程 sleep 被保留。

这通常仍然允许 cmd.Wait() 调用成功,但问题是 cmd.Wait() 等待进程退出并且 用于复制输出。因为我们已经分配了cmd.Stdout,所以我们必须等待sleep进程的stdout管道的读端关闭,但它永远不会关闭,因为该进程是仍在运行。

为了杀死子进程,我们可以通过设置 Setpgid 位来启动该进程作为其自己的进程组领导者,这将允许我们使用其来杀死该进程> 用于终止进程以及任何子进程的 PID。

这是我想出的 exec.CommandContext 的直接替代品,它的作用正是如此:

type Cmd struct {
    ctx context.Context
    terminated chan struct{}
    *exec.Cmd
}

// NewCommand is like exec.CommandContext but ensures that subprocesses
// are killed when the context times out, not just the top level process.
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
    return &Cmd{
        ctx:        ctx,
        terminated: make(chan struct{}),
        Cmd:        exec.Command(command, args...),
    }
}

func (c *Cmd) Start() error {
    // Force-enable setpgid bit so that we can kill child processes when the
    // context times out or is canceled.
    if c.Cmd.SysProcAttr == nil {
        c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
    }
    c.Cmd.SysProcAttr.Setpgid = true
    err := c.Cmd.Start()
    if err != nil {
        return err
    }
    go func() {
        select {
        case <-c.terminated:
            return
        case <-c.ctx.Done():
        }
        p := c.Cmd.Process
        if p == nil {
            return
        }
        // Kill by negative PID to kill the process group, which includes
        // the top-level process we spawned as well as any subprocesses
        // it spawned.
        _ = syscall.Kill(-p.Pid, syscall.SIGKILL)
    }()
    return nil
}

func (c *Cmd) Run() error {
    if err := c.Start(); err != nil {
        return err
    }
    return c.Wait()
}

func (c *Cmd) Wait() error {
    defer close(c.terminated)
    return c.Cmd.Wait()
}

--- 更新 ---

自从编写这段代码以来,我遇到了子进程有时想要加入的情况他们自己的进程组,并且 setpgid 技巧不再起作用,因为它不会杀死这些新进程组中的进程。更可靠的解决方案可能是使用 go-ps 之类的工具手动遍历进程树,并且对于每个后代进程,使用以下伪代码

// KillProcessTree kills an entire process tree using SIGKILL.
func KillProcessTree(pid int) error {
  // Send SIGSTOP to prevent new children from being spawned
  _ = syscall.Signal(pid, syscall.SIGSTOP)
  // TODO: implement ChildProcesses
  for _, c := range ChildProcesses(pid) {
    _ = KillProcessTree(c.Pid)
  }
  // Now that the process is stopped and all descendants
  // are guaranteed to be killed, we can safely SIGKILL
  // this process, without worrying about descendant
  // processes being reparented to pid 1 or anything
  // like that.
  _ = syscall.Signal(pid, syscall.SIGKILL)
  return nil // TODO: better error handling :)
}

My issue was that the child sleep process was not being killed when the context timed out. The sh parent process was being killed, but the child sleep was being left around.

This would normally still allow the cmd.Wait() call to succeed, but the problem is that cmd.Wait() waits for both the process to exit and for outputs to be copied. Because we've assigned cmd.Stdout, we have to wait for the read-end of the sleep process' stdout pipe to close, but it never closes because the process is still running.

In order to kill child processes, we can instead start the process as its own process group leader by setting the Setpgid bit, which will then allow us to kill the process using its negative PID to kill the process as well as any subprocesses.

Here is a drop-in replacement for exec.CommandContext I came up with that does exactly this:

type Cmd struct {
    ctx context.Context
    terminated chan struct{}
    *exec.Cmd
}

// NewCommand is like exec.CommandContext but ensures that subprocesses
// are killed when the context times out, not just the top level process.
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
    return &Cmd{
        ctx:        ctx,
        terminated: make(chan struct{}),
        Cmd:        exec.Command(command, args...),
    }
}

func (c *Cmd) Start() error {
    // Force-enable setpgid bit so that we can kill child processes when the
    // context times out or is canceled.
    if c.Cmd.SysProcAttr == nil {
        c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
    }
    c.Cmd.SysProcAttr.Setpgid = true
    err := c.Cmd.Start()
    if err != nil {
        return err
    }
    go func() {
        select {
        case <-c.terminated:
            return
        case <-c.ctx.Done():
        }
        p := c.Cmd.Process
        if p == nil {
            return
        }
        // Kill by negative PID to kill the process group, which includes
        // the top-level process we spawned as well as any subprocesses
        // it spawned.
        _ = syscall.Kill(-p.Pid, syscall.SIGKILL)
    }()
    return nil
}

func (c *Cmd) Run() error {
    if err := c.Start(); err != nil {
        return err
    }
    return c.Wait()
}

func (c *Cmd) Wait() error {
    defer close(c.terminated)
    return c.Cmd.Wait()
}

--- UPDATE ---

Since writing this code I've run into cases where subprocesses sometimes want to join their own process groups, and the setpgid trick no longer works because it will not kill processes in those new process groups. A more robust solution might be to manually traverse the process tree using something like go-ps, and for each descendant process, use the following pseudocode:

// KillProcessTree kills an entire process tree using SIGKILL.
func KillProcessTree(pid int) error {
  // Send SIGSTOP to prevent new children from being spawned
  _ = syscall.Signal(pid, syscall.SIGSTOP)
  // TODO: implement ChildProcesses
  for _, c := range ChildProcesses(pid) {
    _ = KillProcessTree(c.Pid)
  }
  // Now that the process is stopped and all descendants
  // are guaranteed to be killed, we can safely SIGKILL
  // this process, without worrying about descendant
  // processes being reparented to pid 1 or anything
  // like that.
  _ = syscall.Signal(pid, syscall.SIGKILL)
  return nil // TODO: better error handling :)
}
拍不死你 2025-01-25 09:39:57

通过设置 cmd.waitdelay我们可以确保我们可以确保即使IO管道未关闭,该过程也将被杀死。

这是在 GO 1.20 中引入的。

   cmd.WaitDelay = 1 * time.Duration

By setting cmd.WaitDelay we can make sure the process will be killed even if the io Pipes are not closed.

This was introduced in go 1.20.

   cmd.WaitDelay = 1 * time.Duration
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文