有时长时间运行的 ssh 命令会停止打印到标准输出
我一直在使用 Perl::Net::SSH 在我的远程机器上自动运行一些脚本。然而,其中一些脚本需要很长时间才能完成(一两个小时),有时,我停止从它们获取数据,但实际上并没有失去连接。
这是我正在使用的代码:
sub run_regression_tests {
for(my $i = 0; $i < @servers; $i++){
my $inner = $users[$i];
foreach(@$inner){
my $user = $_;
my $server = $servers[$i];
my $outFile;
open($outFile, ">" . $outputDir . $user . '@' . $server . ".log.txt");
print $outFile "Opening connection to $user at $server on " . localtime() . "\n\n";
close($outFile);
my $pid = $pm->start and next;
print "Connecting to $user\@$server...\n";
my $hasWentToDownloadYet = 0;
my $ssh = Net::SSH::Perl->new($server, %sshParams);
$ssh->login($user, $password);
$ssh->register_handler("stdout", sub {
my($channel, $buffer) = @_;
my $outFile;
open($outFile, ">>", $outputDir . $user . '@' . $server . ".log.txt");
print $outFile $buffer->bytes;
close($outFile);
my @lines = split("\n", $buffer->bytes);
foreach(@lines){
if($_ =~ m/REGRESSION TEST IS COMPLETE/){
$ssh->_disconnect();
if(!$hasWentToDownloadYet){
$hasWentToDownloadYet = 1;
print "Caught exit signal.\n";
print("Regression tests for ${user}\@${server} finised.\n");
download_regression_results($user, $server);
$pm->finish;
}
}
}
});
$ssh->register_handler("stderr", sub {
my($channel, $buffer) = @_;
my $outFile;
open($outFile, ">>", $outputDir . $user . '@' . $server . ".log.txt");
print $outFile $buffer->bytes;
close($outFile);
});
if($debug){
$ssh->cmd('tail -fn 40 /GDS/gds/gdstest/t-gds-master/bin/comp.reg');
}else{
my ($stdout, $stderr, $exit) = $ssh->cmd('. ./.profile && cleanall && my.comp.reg');
if(!$exit){
print "SSH connection failed for ${user}\@${server} finised.\n";
}
}
#$ssh->cmd('. ./.profile');
if(!$hasWentToDownloadYet){
$hasWentToDownloadYet = 1;
print("Regression tests for ${user}\@${server} finised.\n");
download_regression_results($user, $server);
}
$pm->finish;
}
}
sleep(1);
print "\n\n\nAll tests started. Tests typically take 1 hour to complete.\n";
print "If they take significantly less time, there could be an error.\n";
print "\n\nNo output will be printed until all commands have executed and finished.\n";
print "If you wish to watch the progress tail -f one of the logs this script produces.\n Example:\n\t" . 'tail -f ./[email protected]' . "\n";
$pm->wait_all_children;
print "\n\nAll Tests are Finished. \n";
}
这是我的 %sshParams:
my %sshParams = (
protocol => '2',
port => '22',
options => [
"TCPKeepAlive yes",
"ConenctTimeout 10",
"BatchMode yes"
]
);
有时,长时间运行的命令之一会随机停止打印/触发 stdout 或 stderr 事件,并且永远不会退出。 ssh 连接不会终止(据我所知),因为 $ssh->cmd
仍然处于阻塞状态。
知道如何纠正这种行为吗?
I have been using Perl::Net::SSH to automate running some scripts on my remote boxes. However, some of these scripts take a really long time to complete (hour or two) and sometimes, I stop getting data from them, without actually losing the connection.
Here's the code I'm using:
sub run_regression_tests {
for(my $i = 0; $i < @servers; $i++){
my $inner = $users[$i];
foreach(@$inner){
my $user = $_;
my $server = $servers[$i];
my $outFile;
open($outFile, ">" . $outputDir . $user . '@' . $server . ".log.txt");
print $outFile "Opening connection to $user at $server on " . localtime() . "\n\n";
close($outFile);
my $pid = $pm->start and next;
print "Connecting to $user\@$server...\n";
my $hasWentToDownloadYet = 0;
my $ssh = Net::SSH::Perl->new($server, %sshParams);
$ssh->login($user, $password);
$ssh->register_handler("stdout", sub {
my($channel, $buffer) = @_;
my $outFile;
open($outFile, ">>", $outputDir . $user . '@' . $server . ".log.txt");
print $outFile $buffer->bytes;
close($outFile);
my @lines = split("\n", $buffer->bytes);
foreach(@lines){
if($_ =~ m/REGRESSION TEST IS COMPLETE/){
$ssh->_disconnect();
if(!$hasWentToDownloadYet){
$hasWentToDownloadYet = 1;
print "Caught exit signal.\n";
print("Regression tests for ${user}\@${server} finised.\n");
download_regression_results($user, $server);
$pm->finish;
}
}
}
});
$ssh->register_handler("stderr", sub {
my($channel, $buffer) = @_;
my $outFile;
open($outFile, ">>", $outputDir . $user . '@' . $server . ".log.txt");
print $outFile $buffer->bytes;
close($outFile);
});
if($debug){
$ssh->cmd('tail -fn 40 /GDS/gds/gdstest/t-gds-master/bin/comp.reg');
}else{
my ($stdout, $stderr, $exit) = $ssh->cmd('. ./.profile && cleanall && my.comp.reg');
if(!$exit){
print "SSH connection failed for ${user}\@${server} finised.\n";
}
}
#$ssh->cmd('. ./.profile');
if(!$hasWentToDownloadYet){
$hasWentToDownloadYet = 1;
print("Regression tests for ${user}\@${server} finised.\n");
download_regression_results($user, $server);
}
$pm->finish;
}
}
sleep(1);
print "\n\n\nAll tests started. Tests typically take 1 hour to complete.\n";
print "If they take significantly less time, there could be an error.\n";
print "\n\nNo output will be printed until all commands have executed and finished.\n";
print "If you wish to watch the progress tail -f one of the logs this script produces.\n Example:\n\t" . 'tail -f ./[email protected]' . "\n";
$pm->wait_all_children;
print "\n\nAll Tests are Finished. \n";
}
And here is my %sshParams:
my %sshParams = (
protocol => '2',
port => '22',
options => [
"TCPKeepAlive yes",
"ConenctTimeout 10",
"BatchMode yes"
]
);
Sometimes randomly one of the long running commands just halts printing/firing the stdout or stderr events and never exits. The ssh connection doesn't die (as far as I'm aware) because the $ssh->cmd
is still blocking.
Any idea how to correct this behaviour?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在您的 %sshParams 哈希中,您可能需要将“TCPKeepAlive yes”添加到您的选项中:
这些选项可能适合也可能不适合您,但我建议为任何长时间运行的 SSH 连接设置 TCPKeepAlive。如果您的路径中有任何类型的状态防火墙,如果它长时间没有通过连接传递流量,它可能会丢弃状态。
In your %sshParams hash, you may need to add "TCPKeepAlive yes" to your options:
Those options might or might not be right for you, but the TCPKeepAlive is something I would recommend setting for any long running SSH connection. If you have any kind of stateful firewall in your path it could drop the state if it hasn't passed traffic over the connection for a long period of time.
它失败的原因可能是您查看
REGRESSION TEST IS COMPLETE
标记的输出的方式。它可能被分成两个不同的 SSH 数据包,因此您的回调永远不会找到它。更好的是,使用一个在完成后结束的远程命令,如下所示:
否则,您将关闭远程连接,但不会停止将保持活动状态的远程进程。
除此之外,您应该尝试使用 Net::OpenSSH 或 Net::OpenSSH::Parallel 而不是网络::SSH::Perl:
It fails probably due to the way you look into the output for the
REGRESSION TEST IS COMPLETE
mark. It may be split over two different SSH packets and so your callback will never found it.Better, use a remote command that ends when it is done as this one-liner:
Otherwise, you are closing the remote connection but not stopping the remote process that will stay alive.
Besides that, you should try using Net::OpenSSH or Net::OpenSSH::Parallel instead of Net::SSH::Perl: