如何在 Perl 中使用线程？

发布于 2024-12-10 23:13:39 字数 522 浏览 0 评论 0原文

我想在 Perl 中使用线程来提高程序的速度...例如，我想在这段代码中使用 20 个线程：

use IO::Socket;
my $in_file2 = 'rang.txt';
open DAT,$in_file2;
my @ip=<DAT>;
close DAT;
chomp(@ip);
foreach my $ip(@ip)
{
    $host = IO::Socket::INET->new(
        PeerAddr => $ip,
        PeerPort => 80,
        proto    => 'tcp',
        Timeout=> 1
    ) 
    and open(OUT, ">>port.txt");
    print OUT $ip."\n";
    close(OUT);
}

在上面的代码中，我们给出了 ip 列表并扫描给定的端口。我想在这段代码中使用线程。还有其他方法可以提高我的代码的速度吗？

谢谢。

原文

I want to use threads in Perl to increase the speed of my program ... for example i want to use 20 threads in this code:

use IO::Socket;
my $in_file2 = 'rang.txt';
open DAT,$in_file2;
my @ip=<DAT>;
close DAT;
chomp(@ip);
foreach my $ip(@ip)
{
    $host = IO::Socket::INET->new(
        PeerAddr => $ip,
        PeerPort => 80,
        proto    => 'tcp',
        Timeout=> 1
    ) 
    and open(OUT, ">>port.txt");
    print OUT $ip."\n";
    close(OUT);
}

In the above code we give a list of ips and scan a given port. I want use threads in this code. Is there any other way to increase the speed of my code?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

荒岛晴空 2024-12-17 23:13:39

您可能想要查看 AnyEvent::Socket，而不是使用线程，或者Coro::Socket，或 POE，或并行::ForkManager。

回复收藏 0 原文

暗藏城府 2024-12-17 23:13:39

阅读 Perl 线程教程。

回复收藏 0 原文

紫﹏色ふ单纯 2024-12-17 23:13:39

Perl 可以执行线程和分叉。官方不推荐“线程”——很大程度上是因为它没有被很好地理解，而且——也许有点违反直觉——不像某些编程语言中的线程那样轻量级。

如果您特别热衷于线程，那么线程的“工作者”模型比为每个任务生成一个线程要好得多。您可能会在某些语言中执行后者 - 在 Perl 中效率非常低。

因此，您可能会执行以下操作：

#!/usr/bin/env perl

use strict;
use warnings;

use threads;
use Thread::Queue;
use IO::Socket;

my $nthreads = 20;

my $in_file2 = 'rang.txt';

my $work_q   = Thread::Queue->new;
my $result_q = Thread::Queue->new;

sub ip_checker {
    while ( my $ip = $work_q->dequeue ) {
        chomp($ip);
        $host = IO::Socket::INET->new(
            PeerAddr => $ip,
            PeerPort => 80,
            proto    => 'tcp',
            Timeout  => 1
        );
        if ( defined $host ) {
            $result_q->enqueue($ip);
        }
    }
}

sub file_writer {
    open( my $output_fh, ">>", "port.txt" ) or die $!;
    while ( my $ip = $result_q->dequeue ) {
        print {$output_fh} "$ip\n";
    }
    close($output_fh);
}


for ( 1 .. $nthreads ) {
    push( @workers, threads->create( \&ip_checker ) );
}
my $writer = threads->create( \&file_writer );

open( my $dat, "<", $in_file2 ) or die $!;
$work_q->enqueue(<$dat>);
close($dat);
$work_q->end;

foreach my $thr (@workers) {
    $thr->join();
}

$result_q->end;
$writer->join();

这使用队列向一组 (20) 个工作线程提供 IP 列表，并通过它们工作，通过 writer 线程整理和打印结果。

但由于不再真正推荐使用线程，更好的方法可能是使用 Parallel::ForkManager ，它与您的代码可能有点像这样：

#!/usr/bin/env perl

use strict;
use warnings;

use Fcntl qw ( :flock );
use IO::Socket;

my $in_file2 = 'rang.txt';
open( my $input,  "<", $in_file2 )  or die $!;
open( my $output, ">", "port.txt" ) or die $!;

my $manager = Parallel::ForkManager->new(20);
foreach my $ip (<$input>) {
    $manager->start and next;

    chomp($ip);
    my $host = IO::Socket::INET->new(
        PeerAddr => $ip,
        PeerPort => 80,
        proto    => 'tcp',
        Timeout  => 1
    );
    if ( defined $host ) {
        flock( $output, LOCK_EX );    #exclusive or write lock
        print {$output} $ip, "\n";
        flock( $output, LOCK_UN );    #unlock
    }
    $manager->finish;
}
$manager->wait_all_children;
close($output);
close($input);

您需要特别小心文件 IO，当多处理，因为重点是您的执行顺序不再明确定义。因此，很容易导致不同的线程破坏另一个线程已打开但尚未刷新到磁盘的文件。

我注意到你的代码 - 你似乎依赖于文件打开失败，以便不打印到它。这不是一件好事，特别是当您的文件句柄没有词法范围时。

但在我上面概述的两种多处理范例中（还有其他范例，这些是最常见的），您仍然必须处理文件 IO 序列化。请注意，您的“结果”在两者中都将按随机顺序排列，因为这在很大程度上取决于任务完成的时间。如果这对您很重要，那么您需要在线程或分支完成后进行整理和排序。

一般来说，最好考虑分叉 - 正如上面在 threads 文档中所说：

Perl 提供的“基于解释器的线程”并不是人们所期望或希望的快速、轻量级的多任务处理系统。线程的实现方式很容易被误用。很少有人知道如何正确使用它们或能够提供帮助。
官方不鼓励在 Perl 中使用基于解释器的线程。

Perl can do both threading and forking. "threads" is officially not recommended - in no small part because it's not well understood, and - perhaps slightly counterintutively - isn't lightweight like threads are in some programming languages.

If you are particularly keen to thread, the 'worker' model of threading works much better than spawning a thread per task. You might do the latter in some languages - in perl it's very inefficient.

As such you might do something like this:

#!/usr/bin/env perl

use strict;
use warnings;

use threads;
use Thread::Queue;
use IO::Socket;

my $nthreads = 20;

my $in_file2 = 'rang.txt';

my $work_q   = Thread::Queue->new;
my $result_q = Thread::Queue->new;

sub ip_checker {
    while ( my $ip = $work_q->dequeue ) {
        chomp($ip);
        $host = IO::Socket::INET->new(
            PeerAddr => $ip,
            PeerPort => 80,
            proto    => 'tcp',
            Timeout  => 1
        );
        if ( defined $host ) {
            $result_q->enqueue($ip);
        }
    }
}

sub file_writer {
    open( my $output_fh, ">>", "port.txt" ) or die $!;
    while ( my $ip = $result_q->dequeue ) {
        print {$output_fh} "$ip\n";
    }
    close($output_fh);
}


for ( 1 .. $nthreads ) {
    push( @workers, threads->create( \&ip_checker ) );
}
my $writer = threads->create( \&file_writer );

open( my $dat, "<", $in_file2 ) or die $!;
$work_q->enqueue(<$dat>);
close($dat);
$work_q->end;

foreach my $thr (@workers) {
    $thr->join();
}

$result_q->end;
$writer->join();

This uses a queue to feed a set of (20) worker threads with an IP list, and work their way through them, collating and printing results through the writer thread.

But as threads aren't really recommended any more, a better way might be to use Parallel::ForkManager which with your code might go a bit like this:

#!/usr/bin/env perl

use strict;
use warnings;

use Fcntl qw ( :flock );
use IO::Socket;

my $in_file2 = 'rang.txt';
open( my $input,  "<", $in_file2 )  or die $!;
open( my $output, ">", "port.txt" ) or die $!;

my $manager = Parallel::ForkManager->new(20);
foreach my $ip (<$input>) {
    $manager->start and next;

    chomp($ip);
    my $host = IO::Socket::INET->new(
        PeerAddr => $ip,
        PeerPort => 80,
        proto    => 'tcp',
        Timeout  => 1
    );
    if ( defined $host ) {
        flock( $output, LOCK_EX );    #exclusive or write lock
        print {$output} $ip, "\n";
        flock( $output, LOCK_UN );    #unlock
    }
    $manager->finish;
}
$manager->wait_all_children;
close($output);
close($input);

You need to be particularly careful of file IO when multiprocessing, because the whole point is your execution sequence is no longer well defined. So it's insanely easy to end up with different threads clobbering files that another thread has open, but hasn't flushed to disk.

I note your code - you seem to rely on failing a file open, in order to not print to it. That's not a nice thing to do, especially when your file handle is not lexically scoped.

But in both multiprocessing paradigms I outlined above (there are others, these are the most common) you still have to deal with the file IO serialisation. Note that your 'results' will be in a random order in both, because it'll very much depend on when the task completes. If that's important to you, then you'll need to collate and sort after your threads or forks complete.

It's probably generally better to look towards forking - as said above, in threads docs:

The "interpreter-based threads" provided by Perl are not the fast, lightweight system for multitasking that one might expect or hope for. Threads are implemented in a way that make them easy to misuse. Few people know how to use them correctly or will be able to provide help.
The use of interpreter-based threads in perl is officially discouraged.

回复收藏 0 原文

~没有更多了~