在 Perl 中解析带双引号的制表符分隔文件

发布于 2024-10-10 13:25:43 字数 967 浏览 10 评论 0原文

我有一个数据集,它是用双引号引起来的用户代理字符串制表符分隔的。我需要解析每一列并根据我的答案 其他帖子 我使用了 Text::CSV 模块。

94410634  0   GET  "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; AskTB5.5)"   1

代码很简单。

#!/usr/bin/perl

use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new(sep_char => "\t");

    while (<>) {
        if ($csv->parse($_)) {
            my @columns = $csv->fields();
            print "@columns\n";
        } else {
            my $err = $csv->error_input;
            print "Failed to parse line: $err";
        }
    }

但是当我在此数据集上尝试时,出现 Failed to parse line: 错误。我做错了什么?我需要提取包含用户代理字符串的第四列以进行进一步处理。

I have a data set that is tab delimited with the user-agent strings in double quotes. I need to parse each of these columns and based on the answer of my other post I used the Text::CSV module.

94410634  0   GET  "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; AskTB5.5)"   1

The code is a simple one.

#!/usr/bin/perl

use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new(sep_char => "\t");

    while (<>) {
        if ($csv->parse($_)) {
            my @columns = $csv->fields();
            print "@columns\n";
        } else {
            my $err = $csv->error_input;
            print "Failed to parse line: $err";
        }
    }

But i get the Failed to parse line: error when I try it on this dataset. what am I doing wrong? I need to extract the 4th column containing the user-agent strings for further processing.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

神妖 2024-10-17 13:25:43
  1. 您的构造函数参数应该位于 hashref 中,而不是散列中:

    my $csv = Text::CSV->new( { sep_char => "\t" } );

  2. 您确定数据集正是您所认为的那样吗?可能是某处缺少双引号或者没有制表符?

    要验证文件内容,您使用的是 Unix/Linux 还是 Windows?在 unix 上,请运行以下命令:cat -vet my_log_file_name | head -3 并检查输出是否在您期望制表符的位置包含空格或“^I”序列。 cat -vet 将所有特殊字符打印为特殊可打印序列(TAB => ^I, newline => $ 等...)

以下测试在我的 ActivePerl 上完美运行:

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;

my $s = qq[94410634\t0\tGET\t"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; AskTB5.5)"\t1\n];;
my $csv = Text::CSV->new({sep_char => "\t"});

if ($csv->parse($s)) {
    my @columns = $csv->fields();
    print "c=$columns[3]\n";
} else {
    my $err = $csv->error_input;
    print "Failed to parse line: $err";
}

输出

C:\> perl d:\scripts\test4.pl
c=Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; ...
  1. Your constructor arguments should be in a hashref, not a hash:

    my $csv = Text::CSV->new( { sep_char => "\t" } );

  2. Are you sure the dataset is exactly what you think it is? May be there's a double quote missing somewhere or there were no tabs?

    To verify the file contents, are you on Unix/Linux or Windows? On unix, please run this: cat -vet my_log_file_name | head -3 and check whether the output has spaces or "^I" sequences where you expect tabs. cat -vet prints out all the special characters as special printable sequences (TAB => ^I, newline => $, etc...)

The following test works perfectly on my ActivePerl:

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;

my $s = qq[94410634\t0\tGET\t"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; AskTB5.5)"\t1\n];;
my $csv = Text::CSV->new({sep_char => "\t"});

if ($csv->parse($s)) {
    my @columns = $csv->fields();
    print "c=$columns[3]\n";
} else {
    my $err = $csv->error_input;
    print "Failed to parse line: $err";
}

Output:

C:\> perl d:\scripts\test4.pl
c=Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; ...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文