如何使用 Perl 正则表达式提取多行代码?

发布于 2024-11-09 10:47:53 字数 1495 浏览 0 评论 0原文

我正在尝试从该网站提取所有 IP 地址: http://www.game-monitor.com /

我想正则表达式该页面上的 IP,提取所有这些并将它们显示在屏幕上。

这就是我到目前为止所拥有的,你能告诉我出了什么问题并帮助我吗?

#!/usr/bin/perl

use HTTP::Request;
use LWP::UserAgent;

print 'Press [1] To Begin: ';
chomp ($begin = <STDIN>);

my $url = 'http://www.game-monitor.com/';
my @ips = ('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}','\d{1,3}\.\d{1,2}\.\d{1,3}\.\d{1,2}','\d{1,2}   \.\d{1,3}\.\d{1,2}\.\d{1,3}','\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,3}','\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,2}','\d{1,3}\.\d{1,3}\.\d{1,2}\.\d{1,2}','\d{1,2}\.\d{1,2}\.\d{1,3}\.\d{1,3}','\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,2}','\d{1,2}\.\d{1,3}\.\d{1,3}\.\d{1,2}','\d{1,3}\.\d{1,2}\.\d{1,2}\.\d{1,3}');

if ($begin eq 1)
{
my $request = HTTP::Request->new(GET => $url);
my $useragent = LWP::UserAgent->new();
my $response = $useragent->request($request);
my $result = $response->content;

foreach $ip (@ips)
{
if ($result =~ /($ips[0])/ ||
$result =~ /($ips[1])/ ||
$result =~ /($ips[2])/ ||
$result =~ /($ips[3])/ ||
$result =~ /($ips[4])/ ||
$result =~ /($ips[5])/ ||
$result =~ /($ips[6])/ ||
$result =~ /($ips[7])/ ||
$result =~ /($ips[8])/ ||
$result =~ /($ips[9])/
)
{
    print "IP: $1 \n";
    print "IP: $2 \n";
    print "IP: $3 \n";
    print "IP: $4 \n";
    print "IP: $5 \n";
    print "IP: $6 \n";
    print "IP: $7 \n";
    print "IP: $8 \n";
    print "IP: $9 \n";
    print "IP: $10 \n";
}
}
}

I am trying to extract all of the IP Addresses off of this website: http://www.game-monitor.com/

I want to regex the IP's on that page, extract all of them and display them on the screen.

This is what I have so far, can you tell me what Is wrong and help me?

#!/usr/bin/perl

use HTTP::Request;
use LWP::UserAgent;

print 'Press [1] To Begin: ';
chomp ($begin = <STDIN>);

my $url = 'http://www.game-monitor.com/';
my @ips = ('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}','\d{1,3}\.\d{1,2}\.\d{1,3}\.\d{1,2}','\d{1,2}   \.\d{1,3}\.\d{1,2}\.\d{1,3}','\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,3}','\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,2}','\d{1,3}\.\d{1,3}\.\d{1,2}\.\d{1,2}','\d{1,2}\.\d{1,2}\.\d{1,3}\.\d{1,3}','\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,2}','\d{1,2}\.\d{1,3}\.\d{1,3}\.\d{1,2}','\d{1,3}\.\d{1,2}\.\d{1,2}\.\d{1,3}');

if ($begin eq 1)
{
my $request = HTTP::Request->new(GET => $url);
my $useragent = LWP::UserAgent->new();
my $response = $useragent->request($request);
my $result = $response->content;

foreach $ip (@ips)
{
if ($result =~ /($ips[0])/ ||
$result =~ /($ips[1])/ ||
$result =~ /($ips[2])/ ||
$result =~ /($ips[3])/ ||
$result =~ /($ips[4])/ ||
$result =~ /($ips[5])/ ||
$result =~ /($ips[6])/ ||
$result =~ /($ips[7])/ ||
$result =~ /($ips[8])/ ||
$result =~ /($ips[9])/
)
{
    print "IP: $1 \n";
    print "IP: $2 \n";
    print "IP: $3 \n";
    print "IP: $4 \n";
    print "IP: $5 \n";
    print "IP: $6 \n";
    print "IP: $7 \n";
    print "IP: $8 \n";
    print "IP: $9 \n";
    print "IP: $10 \n";
}
}
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

你爱我像她 2024-11-16 10:47:54

要简化多行替换,请使用 /s 修饰符,它实际上告诉 Perl 假装字符串是单行——即使它不是。

有关更多详细信息,请参阅 perlre

如果您使用像 Regexp 这样的模块,那就太好了::Common::net -- 提供 IPv4 地址的正则表达式,而不是编写您自己的正则表达式来匹配 IP 地址。

例如尝试类似的东西,

use Regexp::Common qw/net/;
while (<>) {
  print $1, "\n" if /($RE{net}{ipv4})/;
}

To simplify multi-line substitutions, use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't.

see perlre for more detail.

It would be nice if you use module like Regexp::Common::net -- provide regexes for IPv4 addresses instead of writing your own regex for matching ip addresses.

for example try something like,

use Regexp::Common qw/net/;
while (<>) {
  print $1, "\n" if /($RE{net}{ipv4})/;
}
揽月 2024-11-16 10:47:54

使用 /g 修饰符匹配所有 IP。
提示:使用 -w 参数和 strict 包来避免“糟糕的编码风格”。

#!/usr/bin/perl -w

use strict;
use HTTP::Request;
use LWP::UserAgent;

print 'Press [1] To Begin: ';
chomp (my $begin = <STDIN>);

my $url = 'http://www.game-monitor.com/';
my $ip_regex = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';

if ($begin eq 1)
{
    my $request = HTTP::Request->new(GET => $url);
    my $useragent = LWP::UserAgent->new();
    my $response = $useragent->request($request);
    my $result = $response->content;

    while ($result =~ /($ip_regex)/g)
    {
        print "IP: $1 \n";
    }

}

Use the /g modifier to match all IPs.
Tip: use -w parameter and strict package to avoid "bad coding style".

#!/usr/bin/perl -w

use strict;
use HTTP::Request;
use LWP::UserAgent;

print 'Press [1] To Begin: ';
chomp (my $begin = <STDIN>);

my $url = 'http://www.game-monitor.com/';
my $ip_regex = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';

if ($begin eq 1)
{
    my $request = HTTP::Request->new(GET => $url);
    my $useragent = LWP::UserAgent->new();
    my $response = $useragent->request($request);
    my $result = $response->content;

    while ($result =~ /($ip_regex)/g)
    {
        print "IP: $1 \n";
    }

}
惟欲睡 2024-11-16 10:47:54
#!/usr/bin/perl

use HTTP::Request;
use LWP::UserAgent;


my $url = 'http://www.game-monitor.com/';
my $request = HTTP::Request->new(GET => $url);
my $useragent = LWP::UserAgent->new();
my $response = $useragent->request($request);
my $result = $response->content;

@m = ($result =~ /\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/sg);
foreach (@m) {
        print "IP: $_\n";
}
#!/usr/bin/perl

use HTTP::Request;
use LWP::UserAgent;


my $url = 'http://www.game-monitor.com/';
my $request = HTTP::Request->new(GET => $url);
my $useragent = LWP::UserAgent->new();
my $response = $useragent->request($request);
my $result = $response->content;

@m = ($result =~ /\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/sg);
foreach (@m) {
        print "IP: $_\n";
}
初见你 2024-11-16 10:47:54

我真的不明白你想用你的大数组@ips做什么。第一个正则表达式已经匹配所有 IP 地址(因为 \d{1,3} 表示“一到三位数字”,它已经包含具有两位数字的 IP 地址),因此您不需要全部那些带有 \d{1,2} 的排列。

您可以做的一件事是用 \b 单词边界锚点包围您的正则表达式,以确保您不会匹配 99123.123.123.12399 内的 123.123.123.123代码> 或类似的东西。另外,您可能知道您的正则表达式也会匹配 999.999.999.999 之类的内容。如果这不是问题,因为您的输入不包含无效的 IP 地址,那么当然没问题。

最后,您需要 /g 全局修饰符,以便您的正则表达式不仅可以找到字符串中的第一个匹配项,还可以找到所有匹配项。

从本质上讲,如何这样做:

while ($result =~ m/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g) {
    print "IP: 
amp;\n";
}

I don't really see what you're trying to do with your big array @ips. The first regex already matches all IP addresses (since \d{1,3} means "one to three digits", it already contains IP addresses that have two digits), so you don't need all those permutations with \d{1,2}.

One thing you could do is to surround your regex with \b word boundary anchors to ensure that you don't match 123.123.123.123 within 99123.123.123.12399 or something like it. Also, you're probably aware that your regex would also match something like 999.999.999.999. If that's not a problem because your input won't contain invalid IP addresses, then of course that's just fine.

Finally, you need the /g global modifier so your regex finds not just the first but all occurrences in the string.

In essence, how about doing it like this:

while ($result =~ m/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g) {
    print "IP: 
amp;\n";
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文