Perl：如何从解析的文件执行日期计算

发布于 2024-12-19 02:31:28 字数 535 浏览 0 评论 0原文

我有一个包含几列的 csv 文件。示例，

"00000089-6d83-486d-9ddf-30bbbf722583","2011-09-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"

这些是我需要解析的巨大文件中的示例行。我只需要从此文件中选择那些行，其中第四列位于某个列表（例如 1000、2000，.....）内，第二列位于某些日期之间（例如 2011-11-01 00:00:00 到2011-11-15 00:00:00）。

那么，我该如何进行这些日期选择并仅以制表符分隔的形式输出这些行。

在示例中，仅选择第二行并以制表符分隔的形式保存在另一个文件中。

原文

I have a csv file that has several columns. Examples,

"00000089-6d83-486d-9ddf-30bbbf722583","2011-09-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"

These are samples line from a huge file I need to parse. I need to select only those lines from this file where the 4th column is within a certain list (say 1000, 2000, .....) and second column between certain dates (say 2011-11-01 00:00:00 to 2011-11-15 00:00:00).

So, how do I do those date selection and only output those line in tab delimited form.

In the example only the second row would be chosen and saved in tab delimited form in another file.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一花一树开 2024-12-26 02:31:28

使用 Parse::CSV，这里是完成这项工作的方法：

#!/usr/local/bin/perl 
use Modern::Perl;
use Parse::CSV;

my $parser = Parse::CSV->new(
    file => 'text.csv',
);
while ( my $value = $parser->fetch ) {
    if ($value->[3] > 1000 && $value->[3] <= 2000
      && $value->[1] gt '2011-11-01 00:00:00' 
      && $value->[1] lt '2011-11-15 00:00:00' ) {
        say "$value->[0] --> OK";
    }else {
        say "$value->[0] --> KO";
    }
}

输出：

00000089-6d83-486d-9ddf-30bbbf722583 --> KO
000004c9-92c6-4764-b320-b1403276321e --> OK

您还可以使用过滤器功能：

my $parser = Parse::CSV->new(
    file => 'text.csv',
    filter => sub{
            if ($_->[3] > 1000 && $_->[3] <= 2000
             && $_->[1] gt '2011-11-01 00:00:00' 
             && $_->[1] lt '2011-11-15 00:00:00' ) {
               return $_;
            }else {
                return undef;
            }
        }
);

while ( my $value = $parser->fetch ) {
    # do what you want with the filtered rows
}

Using Parse::CSV, here is a way to do the job:

#!/usr/local/bin/perl 
use Modern::Perl;
use Parse::CSV;

my $parser = Parse::CSV->new(
    file => 'text.csv',
);
while ( my $value = $parser->fetch ) {
    if ($value->[3] > 1000 && $value->[3] <= 2000
      && $value->[1] gt '2011-11-01 00:00:00' 
      && $value->[1] lt '2011-11-15 00:00:00' ) {
        say "$value->[0] --> OK";
    }else {
        say "$value->[0] --> KO";
    }
}

output:

00000089-6d83-486d-9ddf-30bbbf722583 --> KO
000004c9-92c6-4764-b320-b1403276321e --> OK

You can also use the filter capability:

my $parser = Parse::CSV->new(
    file => 'text.csv',
    filter => sub{
            if ($_->[3] > 1000 && $_->[3] <= 2000
             && $_->[1] gt '2011-11-01 00:00:00' 
             && $_->[1] lt '2011-11-15 00:00:00' ) {
               return $_;
            }else {
                return undef;
            }
        }
);

while ( my $value = $parser->fetch ) {
    # do what you want with the filtered rows
}

回复收藏 0 原文

百思不得你姐 2024-12-26 02:31:28

您可能想看一下 Time::Piece，像这样使用它（例如）：（

# use strftime() formats.
my $time = Time::Piece->strptime($date, "%Y%m%d %H:%M");

为您的数据应用相关的 strftime 格式）

you may want to take a look at Time::Piece, use it like this (for instance):

# use strftime() formats.
my $time = Time::Piece->strptime($date, "%Y%m%d %H:%M");

(Apply the relevant strftime format for you data)

回复收藏 0 原文

留一抹残留的笑 2024-12-26 02:31:28

首先，它看起来像 CSV，因此您应该使用 Text::CSV_XS< /a> （或 Text::CSV) 来解析它。 Perl 中用于处理日期/时间的“标准”模块是 DateTime 与日期时间::格式::ISO8601 或类似的，但 Date::Parse 也是一种可能性。

回复收藏 0 原文

怂人 2024-12-26 02:31:28

#!/usr/bin/env perl
use strict;
use warnings;

use 5.010;
use utf8;
use Carp;
use Date::Parse;
use English qw(-no_match_vars);

our $VERSION = '0.01';

my @list = qw(1000 2000 3000);

#say "@list";
# if ( '1000' ~~ @list ) {
# say 'done';
# }

#s (say 2011-11-01 00:00:00 to 2011-11-15 00:00:00).

my $start_date = str2time('2011-11-01 00:00:00');
my $end_date   = str2time('2011-11-15 00:00:00');

#my $input_time    = str2time($input_date);
my $RGX_FOUR_FULL = qr{"([^"]+)","([^"]+)","([^"]+)","([^"]+)","([^"]+)"}smo;
my $RGX_DATE_FULL = qr{.*"(\d{4}-\w{2}-\d{2} \d{2}:\d{2}:\d{2})".*}smo;
my @input_data    = <DATA>;

my @res =
grep {
      extract_time($_) >= $start_date
  and extract_time($_) <= $end_date
  and ( extract_four($_) ~~ @list )
} @input_data;

print @res;

#say 'Z';

sub extract_time {
    my ($search_str) = @_;
    $search_str =~ s/$RGX_DATE_FULL/$1/sm;
    return str2time($search_str);
}

sub extract_four {
    my ($search_str) = @_;
    $search_str =~ s/$RGX_FOUR_FULL/$4/sm;
    chomp($search_str);
    #print $search_str;
    return $search_str;
}

__DATA__
"00000089-6d83-486d-9ddf-30bbbf722583","2011-08-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"00000089-6d83-486d-9ddf-30bbbf722583","2011-09-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-10 14:52:30","INTNAME","4000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","3000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"

你得到

"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","3000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"

#!/usr/bin/env perl
use strict;
use warnings;

use 5.010;
use utf8;
use Carp;
use Date::Parse;
use English qw(-no_match_vars);

our $VERSION = '0.01';

my @list = qw(1000 2000 3000);

#say "@list";
# if ( '1000' ~~ @list ) {
# say 'done';
# }

#s (say 2011-11-01 00:00:00 to 2011-11-15 00:00:00).

my $start_date = str2time('2011-11-01 00:00:00');
my $end_date   = str2time('2011-11-15 00:00:00');

#my $input_time    = str2time($input_date);
my $RGX_FOUR_FULL = qr{"([^"]+)","([^"]+)","([^"]+)","([^"]+)","([^"]+)"}smo;
my $RGX_DATE_FULL = qr{.*"(\d{4}-\w{2}-\d{2} \d{2}:\d{2}:\d{2})".*}smo;
my @input_data    = <DATA>;

my @res =
grep {
      extract_time($_) >= $start_date
  and extract_time($_) <= $end_date
  and ( extract_four($_) ~~ @list )
} @input_data;

print @res;

#say 'Z';

sub extract_time {
    my ($search_str) = @_;
    $search_str =~ s/$RGX_DATE_FULL/$1/sm;
    return str2time($search_str);
}

sub extract_four {
    my ($search_str) = @_;
    $search_str =~ s/$RGX_FOUR_FULL/$4/sm;
    chomp($search_str);
    #print $search_str;
    return $search_str;
}

__DATA__
"00000089-6d83-486d-9ddf-30bbbf722583","2011-08-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"00000089-6d83-486d-9ddf-30bbbf722583","2011-09-17 16:25:09","INTNAME","1001","https://mobile.mint.com:443"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-10 14:52:30","INTNAME","4000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","3000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"

and you get

"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","2000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"
"000004c9-92c6-4764-b320-b1403276321e","2011-11-09 13:52:30","INTNAME","3000","http://m.intel.com/content/intel-us/en/shop/shop-landing.html?t=laptop&p=13"

回复收藏 0 原文

~没有更多了~