有时长时间运行的 ssh 命令会停止打印到标准输出

发布于 2024-12-01 00:07:24 字数 4326 浏览 3 评论 0原文

我一直在使用 Perl::Net::SSH 在我的远程机器上自动运行一些脚本。然而,其中一些脚本需要很长时间才能完成(一两个小时),有时,我停止从它们获取数据,但实际上并没有失去连接。

这是我正在使用的代码:

sub run_regression_tests {
    for(my $i = 0; $i < @servers; $i++){
        my $inner = $users[$i];
        foreach(@$inner){
            my $user = $_;
            my $server = $servers[$i];
            
            my $outFile;
            open($outFile, ">" . $outputDir . $user . '@' . $server . ".log.txt");
            print $outFile "Opening connection to $user at $server on " . localtime() . "\n\n";
            close($outFile);
            
            my $pid = $pm->start and next;
            
                print "Connecting to $user\@$server...\n";
                
                my $hasWentToDownloadYet = 0;
                my $ssh = Net::SSH::Perl->new($server, %sshParams);
                $ssh->login($user, $password);              
                
                $ssh->register_handler("stdout", sub {
                    my($channel, $buffer) = @_;             
                    my $outFile;
                    open($outFile, ">>", $outputDir . $user . '@' . $server . ".log.txt");                  
                    print $outFile $buffer->bytes;              
                    close($outFile);                
                    
                    my @lines = split("\n", $buffer->bytes);
                    foreach(@lines){
                        if($_ =~ m/REGRESSION TEST IS COMPLETE/){
                            $ssh->_disconnect();
                            
                            if(!$hasWentToDownloadYet){
                                $hasWentToDownloadYet = 1;
                                print "Caught exit signal.\n";
                                print("Regression tests for ${user}\@${server} finised.\n");
                                download_regression_results($user, $server);
                                $pm->finish;
                            }
                        }
                    }
                    
                });
                $ssh->register_handler("stderr", sub {
                    my($channel, $buffer) = @_;             
                    my $outFile;
                    open($outFile, ">>", $outputDir . $user . '@' . $server . ".log.txt");
                    
                    print $outFile $buffer->bytes;              
                    
                    close($outFile);                
                });
                if($debug){
                    $ssh->cmd('tail -fn 40 /GDS/gds/gdstest/t-gds-master/bin/comp.reg');
                }else{
                    my ($stdout, $stderr, $exit) = $ssh->cmd('. ./.profile && cleanall && my.comp.reg');
                    if(!$exit){
                        print "SSH connection failed for ${user}\@${server} finised.\n";
                    }
                }
                #$ssh->cmd('. ./.profile');
                
                if(!$hasWentToDownloadYet){
                    $hasWentToDownloadYet = 1;
                    print("Regression tests for ${user}\@${server} finised.\n");
                    download_regression_results($user, $server);
                }
                
            $pm->finish;        
        }
    }
    sleep(1);
    print "\n\n\nAll tests started. Tests typically take 1 hour to complete.\n";
    print "If they take significantly less time, there could be an error.\n";
    print "\n\nNo output will be printed until all commands have executed and finished.\n";
    print "If you wish to watch the progress tail -f one of the logs this script produces.\n Example:\n\t" . 'tail -f ./[email protected]' . "\n";
    $pm->wait_all_children;
    print "\n\nAll Tests are Finished. \n";
}

这是我的 %sshParams:

my %sshParams = (
    protocol => '2',
    port => '22',
    options => [
        "TCPKeepAlive yes",
        "ConenctTimeout 10",
        "BatchMode yes"
    ]
);

有时,长时间运行的命令之一会随机停止打印/触发 stdout 或 stderr 事件,并且永远不会退出。 ssh 连接不会终止(据我所知),因为 $ssh->cmd 仍然处于阻塞状态。

知道如何纠正这种行为吗?

I have been using Perl::Net::SSH to automate running some scripts on my remote boxes. However, some of these scripts take a really long time to complete (hour or two) and sometimes, I stop getting data from them, without actually losing the connection.

Here's the code I'm using:

sub run_regression_tests {
    for(my $i = 0; $i < @servers; $i++){
        my $inner = $users[$i];
        foreach(@$inner){
            my $user = $_;
            my $server = $servers[$i];
            
            my $outFile;
            open($outFile, ">" . $outputDir . $user . '@' . $server . ".log.txt");
            print $outFile "Opening connection to $user at $server on " . localtime() . "\n\n";
            close($outFile);
            
            my $pid = $pm->start and next;
            
                print "Connecting to $user\@$server...\n";
                
                my $hasWentToDownloadYet = 0;
                my $ssh = Net::SSH::Perl->new($server, %sshParams);
                $ssh->login($user, $password);              
                
                $ssh->register_handler("stdout", sub {
                    my($channel, $buffer) = @_;             
                    my $outFile;
                    open($outFile, ">>", $outputDir . $user . '@' . $server . ".log.txt");                  
                    print $outFile $buffer->bytes;              
                    close($outFile);                
                    
                    my @lines = split("\n", $buffer->bytes);
                    foreach(@lines){
                        if($_ =~ m/REGRESSION TEST IS COMPLETE/){
                            $ssh->_disconnect();
                            
                            if(!$hasWentToDownloadYet){
                                $hasWentToDownloadYet = 1;
                                print "Caught exit signal.\n";
                                print("Regression tests for ${user}\@${server} finised.\n");
                                download_regression_results($user, $server);
                                $pm->finish;
                            }
                        }
                    }
                    
                });
                $ssh->register_handler("stderr", sub {
                    my($channel, $buffer) = @_;             
                    my $outFile;
                    open($outFile, ">>", $outputDir . $user . '@' . $server . ".log.txt");
                    
                    print $outFile $buffer->bytes;              
                    
                    close($outFile);                
                });
                if($debug){
                    $ssh->cmd('tail -fn 40 /GDS/gds/gdstest/t-gds-master/bin/comp.reg');
                }else{
                    my ($stdout, $stderr, $exit) = $ssh->cmd('. ./.profile && cleanall && my.comp.reg');
                    if(!$exit){
                        print "SSH connection failed for ${user}\@${server} finised.\n";
                    }
                }
                #$ssh->cmd('. ./.profile');
                
                if(!$hasWentToDownloadYet){
                    $hasWentToDownloadYet = 1;
                    print("Regression tests for ${user}\@${server} finised.\n");
                    download_regression_results($user, $server);
                }
                
            $pm->finish;        
        }
    }
    sleep(1);
    print "\n\n\nAll tests started. Tests typically take 1 hour to complete.\n";
    print "If they take significantly less time, there could be an error.\n";
    print "\n\nNo output will be printed until all commands have executed and finished.\n";
    print "If you wish to watch the progress tail -f one of the logs this script produces.\n Example:\n\t" . 'tail -f ./[email protected]' . "\n";
    $pm->wait_all_children;
    print "\n\nAll Tests are Finished. \n";
}

And here is my %sshParams:

my %sshParams = (
    protocol => '2',
    port => '22',
    options => [
        "TCPKeepAlive yes",
        "ConenctTimeout 10",
        "BatchMode yes"
    ]
);

Sometimes randomly one of the long running commands just halts printing/firing the stdout or stderr events and never exits. The ssh connection doesn't die (as far as I'm aware) because the $ssh->cmd is still blocking.

Any idea how to correct this behaviour?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不知所踪 2024-12-08 00:07:24

在您的 %sshParams 哈希中,您可能需要将“TCPKeepAlive yes”添加到您的选项中:

$sshParams{'options'} = ["BatchMode yes", "TCPKeepAlive yes"];

这些选项可能适合也可能不适合您,但我建议为任何长时间运行的 SSH 连接设置 TCPKeepAlive。如果您的路径中有任何类型的状态防火墙,如果它长时间没有通过连接传递流量,它可能会丢弃状态。

In your %sshParams hash, you may need to add "TCPKeepAlive yes" to your options:

$sshParams{'options'} = ["BatchMode yes", "TCPKeepAlive yes"];

Those options might or might not be right for you, but the TCPKeepAlive is something I would recommend setting for any long running SSH connection. If you have any kind of stateful firewall in your path it could drop the state if it hasn't passed traffic over the connection for a long period of time.

真心难拥有 2024-12-08 00:07:24

它失败的原因可能是您查看 REGRESSION TEST IS COMPLETE 标记的输出的方式。它可能被分成两个不同的 SSH 数据包,因此您的回调永远不会找到它。

更好的是,使用一个在完成后结束的远程命令,如下所示:

perl -pe 'BEGIN {$p = open STDIN, "my.comp.reg |" or die $!}; kill TERM => -$p if /REGRESSION TEST IS COMPLETE/}'

否则,您将关闭远程连接,但不会停止将保持活动状态的远程进程。

除此之外,您应该尝试使用 Net::OpenSSHNet::OpenSSH::Parallel 而不是网络::SSH::Perl:

use Net::OpenSSH::Parallel;

my $pssh = Net::OpenSSH::Parallel->new;

for my $i (0..$#server) {
    my $server = $server[$i];
    for my $user (@{$users[$ix]}) {
        $pssh->add_host("$user\@$server", password => $password);
    }
}

if ($debug) {
    $pssh->all(cmd => { stdout_file => "$outputDir%USER%\@%HOST%.log.txt",
                        stderr_to_stdout => 1 },
               'fail -fn 40 /GDS/gds/gdstest/t-gds-master/bin/comp.reg');
}
else {
    $pssh->all(cmd => { stdout_file => "$outputDir%USER%\@%HOST%.log.txt",
                        stderr_to_stdout => 1 },
               '. ./.profile && cleanall && my.comp.reg');
}

$pssh->all(scp_get => $remote_regression_results_path, "regression_results/%USER%\@%HOST%/");

$pssh->run;

It fails probably due to the way you look into the output for the REGRESSION TEST IS COMPLETE mark. It may be split over two different SSH packets and so your callback will never found it.

Better, use a remote command that ends when it is done as this one-liner:

perl -pe 'BEGIN {$p = open STDIN, "my.comp.reg |" or die $!}; kill TERM => -$p if /REGRESSION TEST IS COMPLETE/}'

Otherwise, you are closing the remote connection but not stopping the remote process that will stay alive.

Besides that, you should try using Net::OpenSSH or Net::OpenSSH::Parallel instead of Net::SSH::Perl:

use Net::OpenSSH::Parallel;

my $pssh = Net::OpenSSH::Parallel->new;

for my $i (0..$#server) {
    my $server = $server[$i];
    for my $user (@{$users[$ix]}) {
        $pssh->add_host("$user\@$server", password => $password);
    }
}

if ($debug) {
    $pssh->all(cmd => { stdout_file => "$outputDir%USER%\@%HOST%.log.txt",
                        stderr_to_stdout => 1 },
               'fail -fn 40 /GDS/gds/gdstest/t-gds-master/bin/comp.reg');
}
else {
    $pssh->all(cmd => { stdout_file => "$outputDir%USER%\@%HOST%.log.txt",
                        stderr_to_stdout => 1 },
               '. ./.profile && cleanall && my.comp.reg');
}

$pssh->all(scp_get => $remote_regression_results_path, "regression_results/%USER%\@%HOST%/");

$pssh->run;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文