如何使用 Perl 正则表达式提取多行代码?
我正在尝试从该网站提取所有 IP 地址: http://www.game-monitor.com /
我想正则表达式该页面上的 IP,提取所有这些并将它们显示在屏幕上。
这就是我到目前为止所拥有的,你能告诉我出了什么问题并帮助我吗?
#!/usr/bin/perl
use HTTP::Request;
use LWP::UserAgent;
print 'Press [1] To Begin: ';
chomp ($begin = <STDIN>);
my $url = 'http://www.game-monitor.com/';
my @ips = ('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}','\d{1,3}\.\d{1,2}\.\d{1,3}\.\d{1,2}','\d{1,2} \.\d{1,3}\.\d{1,2}\.\d{1,3}','\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,3}','\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,2}','\d{1,3}\.\d{1,3}\.\d{1,2}\.\d{1,2}','\d{1,2}\.\d{1,2}\.\d{1,3}\.\d{1,3}','\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,2}','\d{1,2}\.\d{1,3}\.\d{1,3}\.\d{1,2}','\d{1,3}\.\d{1,2}\.\d{1,2}\.\d{1,3}');
if ($begin eq 1)
{
my $request = HTTP::Request->new(GET => $url);
my $useragent = LWP::UserAgent->new();
my $response = $useragent->request($request);
my $result = $response->content;
foreach $ip (@ips)
{
if ($result =~ /($ips[0])/ ||
$result =~ /($ips[1])/ ||
$result =~ /($ips[2])/ ||
$result =~ /($ips[3])/ ||
$result =~ /($ips[4])/ ||
$result =~ /($ips[5])/ ||
$result =~ /($ips[6])/ ||
$result =~ /($ips[7])/ ||
$result =~ /($ips[8])/ ||
$result =~ /($ips[9])/
)
{
print "IP: $1 \n";
print "IP: $2 \n";
print "IP: $3 \n";
print "IP: $4 \n";
print "IP: $5 \n";
print "IP: $6 \n";
print "IP: $7 \n";
print "IP: $8 \n";
print "IP: $9 \n";
print "IP: $10 \n";
}
}
}
I am trying to extract all of the IP Addresses off of this website: http://www.game-monitor.com/
I want to regex the IP's on that page, extract all of them and display them on the screen.
This is what I have so far, can you tell me what Is wrong and help me?
#!/usr/bin/perl
use HTTP::Request;
use LWP::UserAgent;
print 'Press [1] To Begin: ';
chomp ($begin = <STDIN>);
my $url = 'http://www.game-monitor.com/';
my @ips = ('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}','\d{1,3}\.\d{1,2}\.\d{1,3}\.\d{1,2}','\d{1,2} \.\d{1,3}\.\d{1,2}\.\d{1,3}','\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,3}','\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,2}','\d{1,3}\.\d{1,3}\.\d{1,2}\.\d{1,2}','\d{1,2}\.\d{1,2}\.\d{1,3}\.\d{1,3}','\d{1,2}\.\d{1,2}\.\d{1,2}\.\d{1,2}','\d{1,2}\.\d{1,3}\.\d{1,3}\.\d{1,2}','\d{1,3}\.\d{1,2}\.\d{1,2}\.\d{1,3}');
if ($begin eq 1)
{
my $request = HTTP::Request->new(GET => $url);
my $useragent = LWP::UserAgent->new();
my $response = $useragent->request($request);
my $result = $response->content;
foreach $ip (@ips)
{
if ($result =~ /($ips[0])/ ||
$result =~ /($ips[1])/ ||
$result =~ /($ips[2])/ ||
$result =~ /($ips[3])/ ||
$result =~ /($ips[4])/ ||
$result =~ /($ips[5])/ ||
$result =~ /($ips[6])/ ||
$result =~ /($ips[7])/ ||
$result =~ /($ips[8])/ ||
$result =~ /($ips[9])/
)
{
print "IP: $1 \n";
print "IP: $2 \n";
print "IP: $3 \n";
print "IP: $4 \n";
print "IP: $5 \n";
print "IP: $6 \n";
print "IP: $7 \n";
print "IP: $8 \n";
print "IP: $9 \n";
print "IP: $10 \n";
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
要简化多行替换,请使用
/s 修饰符
,它实际上告诉 Perl 假装字符串是单行——即使它不是。有关更多详细信息,请参阅 perlre 。
如果您使用像 Regexp 这样的模块,那就太好了::Common::net -- 提供 IPv4 地址的正则表达式,而不是编写您自己的正则表达式来匹配 IP 地址。
例如尝试类似的东西,
To simplify multi-line substitutions, use the
/s modifier
, which in effect tells Perl to pretend the string is a single line--even if it isn't.see perlre for more detail.
It would be nice if you use module like Regexp::Common::net -- provide regexes for IPv4 addresses instead of writing your own regex for matching ip addresses.
for example try something like,
使用
/g
修饰符匹配所有 IP。提示:使用
-w
参数和strict
包来避免“糟糕的编码风格”。Use the
/g
modifier to match all IPs.Tip: use
-w
parameter andstrict
package to avoid "bad coding style".我真的不明白你想用你的大数组
@ips
做什么。第一个正则表达式已经匹配所有 IP 地址(因为\d{1,3}
表示“一到三位数字”,它已经包含具有两位数字的 IP 地址),因此您不需要全部那些带有\d{1,2}
的排列。您可以做的一件事是用
\b
单词边界锚点包围您的正则表达式,以确保您不会匹配99123.123.123.12399
内的123.123.123.123
代码> 或类似的东西。另外,您可能知道您的正则表达式也会匹配999.999.999.999
之类的内容。如果这不是问题,因为您的输入不包含无效的 IP 地址,那么当然没问题。最后,您需要
/g
全局修饰符,以便您的正则表达式不仅可以找到字符串中的第一个匹配项,还可以找到所有匹配项。从本质上讲,如何这样做:
I don't really see what you're trying to do with your big array
@ips
. The first regex already matches all IP addresses (since\d{1,3}
means "one to three digits", it already contains IP addresses that have two digits), so you don't need all those permutations with\d{1,2}
.One thing you could do is to surround your regex with
\b
word boundary anchors to ensure that you don't match123.123.123.123
within99123.123.123.12399
or something like it. Also, you're probably aware that your regex would also match something like999.999.999.999
. If that's not a problem because your input won't contain invalid IP addresses, then of course that's just fine.Finally, you need the
/g
global modifier so your regex finds not just the first but all occurrences in the string.In essence, how about doing it like this: