在 Linux 上通过 SSH 运行多线程 Perl 脚本的问题
如果我通过 SSH 远程运行以下简单 Perl 脚本,我将无法理解它的行为。
use strict;
use warnings;
use threads;
use threads::shared;
use POSIX;
my $print_mutex : shared;
################################################################################
sub _print($)
{
my $str = shift;
lock($print_mutex);
my $id = threads->tid();
my $time = strftime('%H:%M:%S', localtime time);
print "$time [$id] $str";
return;
}
################################################################################
sub run()
{
for my $i (1 .. 3)
{
_print("Begin $i\n");
sleep 1;
_print("End $i\n");
}
return threads->tid();
}
################################################################################
_print "Starting test.\n";
my @threads;
for my $thr_num (1 .. 2)
{
my $thr = threads->create('run');
push @threads, $thr;
_print "Thread created.\n";
}
foreach (@threads)
{
my $id = $_->join;
_print "Thread '$id' finished.\n";
}
_print "Test finished.\n";
################################################################################
当我在 Linux 机器上使用 Perl-5.10.0 正常运行它时,我得到了预期的结果:
$ perl /tmp/a.pl 14:25:54 [0] Starting test. 14:25:54 [0] Thread created. 14:25:54 [1] Begin 1 14:25:54 [0] Thread created. 14:25:54 [2] Begin 1 14:25:55 [1] End 1 14:25:55 [1] Begin 2 14:25:55 [2] End 1 14:25:55 [2] Begin 2 14:25:56 [1] End 2 14:25:56 [1] Begin 3 14:25:56 [2] End 2 14:25:56 [2] Begin 3 14:25:57 [1] End 3 14:25:57 [0] Thread '1' finished. 14:25:57 [2] End 3 14:25:57 [0] Thread '2' finished. 14:25:57 [0] Test finished. $
然而,当我通过 SSH 运行它时(在同一本地主机上,但这并不重要),我得到了非常奇怪的结果(仔细查看时间戳和线程 ID):
$ ssh localhost 'perl /tmp/a.pl' 14:26:11 [0] Starting test. 14:26:11 [0] Thread created. 14:26:11 [1] Begin 1 14:26:12 [1] End 1 14:26:12 [1] Begin 2 14:26:13 [1] End 2 14:26:13 [1] Begin 3 14:26:14 [1] End 3 14:26:11 [2] Begin 1 14:26:12 [2] End 1 14:26:12 [2] Begin 2 14:26:13 [2] End 2 14:26:13 [2] Begin 3 14:26:14 [2] End 3 14:26:11 [0] Thread created. 14:26:14 [0] Thread '1' finished. 14:26:14 [0] Thread '2' finished. 14:26:14 [0] Test finished. $
我从未在单线程 Perl 脚本中看到过这种情况,并且我注意到在创建第一个线程后我就开始看到 I/O 问题。
我能够在 Windows 上使用最新的 Perl-5.12 重现该问题,因此我认为该问题不是 Perl/OS 特定的。
有人可以解释一下这里出了什么问题吗?
I cannot understand the behavior of the following simple Perl script in case I run it remotely via SSH.
use strict;
use warnings;
use threads;
use threads::shared;
use POSIX;
my $print_mutex : shared;
################################################################################
sub _print($)
{
my $str = shift;
lock($print_mutex);
my $id = threads->tid();
my $time = strftime('%H:%M:%S', localtime time);
print "$time [$id] $str";
return;
}
################################################################################
sub run()
{
for my $i (1 .. 3)
{
_print("Begin $i\n");
sleep 1;
_print("End $i\n");
}
return threads->tid();
}
################################################################################
_print "Starting test.\n";
my @threads;
for my $thr_num (1 .. 2)
{
my $thr = threads->create('run');
push @threads, $thr;
_print "Thread created.\n";
}
foreach (@threads)
{
my $id = $_->join;
_print "Thread '$id' finished.\n";
}
_print "Test finished.\n";
################################################################################
When I run it normally on my Linux box with Perl-5.10.0 I get expected results:
$ perl /tmp/a.pl
14:25:54 [0] Starting test.
14:25:54 [0] Thread created.
14:25:54 [1] Begin 1
14:25:54 [0] Thread created.
14:25:54 [2] Begin 1
14:25:55 [1] End 1
14:25:55 [1] Begin 2
14:25:55 [2] End 1
14:25:55 [2] Begin 2
14:25:56 [1] End 2
14:25:56 [1] Begin 3
14:25:56 [2] End 2
14:25:56 [2] Begin 3
14:25:57 [1] End 3
14:25:57 [0] Thread '1' finished.
14:25:57 [2] End 3
14:25:57 [0] Thread '2' finished.
14:25:57 [0] Test finished.
$
However, when I run it via SSH (on the same local host, but it doesn't matter) I get very strange results (look closely at timestamps and thread IDs):
$ ssh localhost 'perl /tmp/a.pl'
14:26:11 [0] Starting test.
14:26:11 [0] Thread created.
14:26:11 [1] Begin 1
14:26:12 [1] End 1
14:26:12 [1] Begin 2
14:26:13 [1] End 2
14:26:13 [1] Begin 3
14:26:14 [1] End 3
14:26:11 [2] Begin 1
14:26:12 [2] End 1
14:26:12 [2] Begin 2
14:26:13 [2] End 2
14:26:13 [2] Begin 3
14:26:14 [2] End 3
14:26:11 [0] Thread created.
14:26:14 [0] Thread '1' finished.
14:26:14 [0] Thread '2' finished.
14:26:14 [0] Test finished.
$
I've never seen this in single-threaded Perl scripts and I noticed that I started seeing the problem with I/O right after the first thread has been created.
I was able to reproduce the problem with the latest Perl-5.12 on Windows, so I don't think the problem is Perl/OS specific.
Could someone please explain what's wrong here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我自己能够重现这一点。然而,当通过 ssh 从 shell 运行它时,我得到了预期的行为。那么有什么区别呢?一个伪终端!
试试这个:
I was able to reproduce this myself. However, when running it from a shell over ssh, I got the expected behaviour. So what's the difference? A pseudo-terminal!
Try this:
事实上,看起来每个 Perl 线程都有它自己的输出缓冲区。
我已将输出重定向到文件(与通过 SSH 运行脚本相同,因为它只是禁用行缓冲)并在 strace 下运行脚本:
很明显,每个线程都将所有数据放入线程本地缓冲区,然后(在本示例中,就在线程终止之前)在该缓冲区上调用“写入”系统调用。
恕我直言,线程本地输出缓冲区是非常坏主意,因为即使人们显式序列化“打印”调用,也会得到令人困惑的结果。
我找到的解决方案是使用显式序列化并在 STDOUT 上启用自动刷新,以便线程本地缓冲区始终为空。
Indeed, it looks like each Perl thread has it's own output buffer.
I've redirected the output to the file (the same as running script via SSH since it just disables line buffering) and run the script under strace:
It becomes clear that each thread places all the data into thread-local buffer and only then (in this example just before thread termination) calls "write" system call on that buffer.
IMHO, thread-local output buffers is very bad idea because people get confusing results even if they you explicit serialization of "print" calls.
The solution I found is to use explicit serialization and enable autoflush on STDOUT so that thread-local buffers are always empty.