如何从文本文件中提取数字数据?

发布于 2024-07-11 08:27:58 字数 557 浏览 6 评论 0原文

我希望 Perl 脚本从文本文件中提取数据并将其保存为另一个文本文件。 文本文件的每一行都包含一个 jpg 的 URL,例如“http ://pics1.riyaj.com/thumbs/000/082/104//small.jpg”。 我希望脚本将每个 jpg URL 的最后 6 个数字(即 082104)提取到一个变量中。 我希望将变量添加到新文本每一行的不同位置。

输入文本:

text http://pics1.riyaj.com/thumbs/000/082/104/small.jpg text
text http://pics1.riyaj.com/thumbs/000/569/315/small.jpg text

输出文本:

text php?id=82104 text
text php?id=569315 text 

谢谢

I want the Perl script to extract a data from a text file and save it as another text file. Each line of the text file contains an URL to a jpg like "http://pics1.riyaj.com/thumbs/000/082/104//small.jpg". I want the script to extract the last 6 numbers of each jpg URL, (i.e 082104) to a variable. I want the variable to be added to a different location on each line of the new text.

Input text:

text http://pics1.riyaj.com/thumbs/000/082/104/small.jpg text
text http://pics1.riyaj.com/thumbs/000/569/315/small.jpg text

Output text:

text php?id=82104 text
text php?id=569315 text 

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

巴黎夜雨 2024-07-18 08:27:58

你试过什么了?

这是一个简短的程序,它为您提供了问题的核心,您可以添加其余部分:

while(  )
    {
    s|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|;
    print;
    }

这与命令行程序非常接近,它使用 -p 开关(有关详细信息,请参阅 perlrun 文档):

perl -pi.old -e 's|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|' inputfile > outputfile

What have you tried so far?

Here's a short program that gives you the meat of the problem, and you can add the rest of it:

while(  )
    {
    s|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|;
    print;
    }

This is very close to the command-line program the handles the looping and printing for you with the -p switch (see the perlrun documentation for the details):

perl -pi.old -e 's|http://.*/\d+/(\d+)/(\d+).*?jpg|php?id=$1$2|' inputfile > outputfile
ゝ杯具 2024-07-18 08:27:58

我不知道是否要根据您所描述的内容(“最后 6 位数字”)来回答,或者只是假设它全部符合您所显示的模式。 所以我决定从两个方面来回答。

这是一种可以处理比您的示例更多样化的线条的方法。

use FileHandle;

my $jpeg_RE = qr{
    (.*?)           # Anything, watching out for patterns ahead
    \s+             # At least one space
    (?> http:// )   # Once we match "http://" we're onto the next section
    \S*?            # Any non-space, watching out for what follows
    ( (?: \d+ / )*  # At least one digit, followed by a slash, any number of times
      \d+           # another group of digits
    )               # end group
    \D*?            # Any number of non-digits looking ahead
    \.jpg           # literal string '.jpg'
    \s+             # At least one space
   (.*)             # The rest of the line
}x;

my $infile  = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );

while ( my $line = <$infile> ) { 
    my ( $pre_text, $digits, $post_text ) = ( $line =~ m/$jpeg_RE/ );
    $digits        =~ s/\D//g;
    $outfile->printf( "$pre_text php?id=%s $post_text\n", substr( $digits, -6 ));
}
$infile->close();

然而,如果它像你展示的那样有规律,它就会变得容易得多:

use FileHandle;
my $jpeg_RE = qr{
    (?> \Qhttp://pics1.riyaj.com/thumbs/\E ) 
    \d{3}
    /
    ( \d{3} )
    / 
    ( \d{3} )
    \S*?
    \.jpg
}x;

my $infile  = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );

while ( my $line = <$infile> ) { 
    $line =~ s/$jpeg_RE/php?id=$1$2/g;
    $outfile->print( $line );
}
$infile->close();

I didn't know whether to answer according to what you described ("last 6 digits") or just assume that it all fits the pattern you showed. So I decided to answer both ways.

Here is a method that can handle lines more diverse than your examples.

use FileHandle;

my $jpeg_RE = qr{
    (.*?)           # Anything, watching out for patterns ahead
    \s+             # At least one space
    (?> http:// )   # Once we match "http://" we're onto the next section
    \S*?            # Any non-space, watching out for what follows
    ( (?: \d+ / )*  # At least one digit, followed by a slash, any number of times
      \d+           # another group of digits
    )               # end group
    \D*?            # Any number of non-digits looking ahead
    \.jpg           # literal string '.jpg'
    \s+             # At least one space
   (.*)             # The rest of the line
}x;

my $infile  = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );

while ( my $line = <$infile> ) { 
    my ( $pre_text, $digits, $post_text ) = ( $line =~ m/$jpeg_RE/ );
    $digits        =~ s/\D//g;
    $outfile->printf( "$pre_text php?id=%s $post_text\n", substr( $digits, -6 ));
}
$infile->close();

However, if it's just as regular as you show, it gets a lot easier:

use FileHandle;
my $jpeg_RE = qr{
    (?> \Qhttp://pics1.riyaj.com/thumbs/\E ) 
    \d{3}
    /
    ( \d{3} )
    / 
    ( \d{3} )
    \S*?
    \.jpg
}x;

my $infile  = FileHandle->new( "<$file_in" );
my $outfile = FileHandle->new( ">$file_out" );

while ( my $line = <$infile> ) { 
    $line =~ s/$jpeg_RE/php?id=$1$2/g;
    $outfile->print( $line );
}
$infile->close();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文