如何从外部检查运行bash脚本（即不在专用外壳中运行它）

发布于 2025-01-20 06:06:23 字数 1805 浏览 0 评论 0原文

想象一下一个已经在运行的 shell 脚本（bash、sh、..），它陷入了无限循环，而您想要“查看” it'（即获取堆栈跟踪或脚本中的当前位置）以了解正在发生的情况。

在 Python 中，当我遇到奇怪的行为（在 CPU 使用率、无限循环等方面）时，我的典型方法是让脚本允许我在运行时“查看它”。例如，我会扩展给定的脚本，比如说

def fib(n):
    return 0 if n == 0 else 1 if n < 3 else fib(n-1) + fib(n-2)

print(fib(38))  # might run long

处理一些信号，例如 SIGPOLL 像这样：

import signal, traceback, os

def fib(n):
    return 0 if n == 0 else 1 if n < 3 else fib(n-1) + fib(n-2)

signal.signal(signal.SIGPOLL, lambda sig, frame: print(
    "\n".join(traceback.format_stack(frame))))
print(f"This might take long - run `kill -{signal.SIGPOLL} {os.getpid()}` to look into stack")

print(fib(38))  # might run long

现在我可以从另一个终端运行 kill -SIGPOLL 每当我发现这个脚本运行的时间出奇的长。我仍然需要先在专用终端中运行它才能查看输出，但我可以轻松修改代码以写入文件或类似文件。

bash/sh 也可以这样做吗？也许甚至是内置的？

最好的情况是，我什至不需要首先修改脚本，但是除了使用 strace 或 -x 被设置，我会接受它:)

这是我有时使用的 - 它在某些情况下对我有帮助，但没有给我有关正在执行的代码的详细信息，如行号、调用堆栈等，并且脚本必须在额外的终端中运行以便查看输出：

function toggle_tracing {
    if [ -z "$TRACE_ENABLED" ]; then
        TRACE_ENABLED=1
        set -x
    else
        unset TRACE_ENABLED
        set +x
    fi
}

trap toggle_tracing USR1
echo "run 'kill -SIGUSR1 $BASHPID' to activate tracing"

# do something really time consuming
while true; do
    find ~ > /dev/null
done

对于这种方法我必须提前准备工作（脚本需要修改并在专用终端或 tmux 会话中运行），这不利于调查意外和不经常发生的事件。

您还可以运行 strace -p，它的优点是不限于（shell-）脚本，但由于您只获得系统 IO，所以您必须幸运并且非常了解可执行文件以猜测它的内部状态。

原文

Imagine an arbitrary already running shell script (bash, sh, ..) that gets stuck in an endless loop and you want to 'look into it' (i.e. get a stack trace or current position in the script) to get an idea of what's going on.

In Python when I encounter strange behavior (in terms of CPU usage, endless loops, etc), my typical approach is to make the script allow me to 'look into it' while it's running. E.g. I would extend a given script, say

def fib(n):
    return 0 if n == 0 else 1 if n < 3 else fib(n-1) + fib(n-2)

print(fib(38))  # might run long

to handle some signal, e.g. SIGPOLL like this:

import signal, traceback, os

def fib(n):
    return 0 if n == 0 else 1 if n < 3 else fib(n-1) + fib(n-2)

signal.signal(signal.SIGPOLL, lambda sig, frame: print(
    "\n".join(traceback.format_stack(frame))))
print(f"This might take long - run `kill -{signal.SIGPOLL} {os.getpid()}` to look into stack")

print(fib(38))  # might run long

Now I can just run kill -SIGPOLL <PID> from another terminal whenever I find this script to run surprisingly long. I still need to run it in a dedicated terminal first to see the output, but I could easily modify the code to write to a file or similar.

Is this possible for bash/sh, too? Maybe even built in?

Best would be if I didn't even have to modify the script in the first place, but if there is any way apart from executing it in an extra shell with strace or -x being set, I'd take it :)

Here is what I sometimes use - it helps me in some situations but does not give me detailed information about the code being executed like line numbers, call stack etc, and the script has to be run in an extra terminal in order to see the output:

function toggle_tracing {
    if [ -z "$TRACE_ENABLED" ]; then
        TRACE_ENABLED=1
        set -x
    else
        unset TRACE_ENABLED
        set +x
    fi
}

trap toggle_tracing USR1
echo "run 'kill -SIGUSR1 $BASHPID' to activate tracing"

# do something really time consuming
while true; do
    find ~ > /dev/null
done

For this approach to work I have to prepare advance (the script needs to be modified and run in a dedicated terminal or tmux session), which is bad to investigate on unexpected and infrequently occurring incidents.

You could also run strace -p <PID>, which has the advantage of not being limited to (shell-)scripts, but since you only get system IO you have to be lucky and know the executable very well to guess it's inner state.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伪装你 2025-01-27 06:06:23

某种程度上满足我的要求的一种方法是结合按需设置-x和strace，因为跟踪将被写入 STDERR ，我可以从外部跟踪它。

准备工作：修改将来可能会卡住的脚本，以便在启动时运行以下代码：

function toggle_tracing {
    if [ -z "$TRACE_ENABLED" ]; then
        TRACE_ENABLED=1
        set -x
    else
        unset TRACE_ENABLED
        set +x
    fi
}

trap toggle_tracing USR1
echo "run 'kill -SIGUSR1 $BASHPID' to activate tracing"

当您现在发现该进程消耗 CPU 数小时时，您可以通过发送信号将其设置为跟踪模式并开始使用 strace 监视它

# sudo might not be necessary
sudo kill -SIGUSR1 <PID>
sudo strace -p <PID> 2>&1 | grep "write(2, \""

，这将导致命令被打印到您运行 strace 的终端

write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
...

（不过仍然没有行号）

顺便说一句，积分归我的一个朋友所有，他懒得去设置自己的帐户。

One approach that somehow meets my requirements is to combine setting -x on demand and strace, since the traces will be written to STDERR which I can trace from outside.

Preparation: modify the script that might get stuck in the future to run the following code on startup:

function toggle_tracing {
    if [ -z "$TRACE_ENABLED" ]; then
        TRACE_ENABLED=1
        set -x
    else
        unset TRACE_ENABLED
        set +x
    fi
}

trap toggle_tracing USR1
echo "run 'kill -SIGUSR1 $BASHPID' to activate tracing"

When you now catch the process to eat up your CPU for hours you set it into trace mode by sending a signal and start monitoring it using strace

# sudo might not be necessary
sudo kill -SIGUSR1 <PID>
sudo strace -p <PID> 2>&1 | grep "write(2, \""

which will result in the commands being printed to the terminal you ran strace in

write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
write(2, "+ true\n", 7)                 = 7
write(2, "+ find /home/itsme\n", 20)   = 20
...

(still without line numbers, though)

Btw, credits go to a friend of mine who is too lazy to setup their own account.

回复收藏 0 原文

~没有更多了~