如何使用 Perl 提取数据列?

发布于 2024-09-16 00:11:02 字数 259 浏览 9 评论 0原文

我有这种类型的字符串

NAME1              NAME2          DEPTNAME           POSITION
JONH MILLER        ROBERT JIM     CS                 ASST GENERAL MANAGER 

,我希望输出为 name1 name2 和位置,我如何使用 split/regex/trim/etc 而不使用 CPAN 模块来做到这一点?

I have strings of this kind

NAME1              NAME2          DEPTNAME           POSITION
JONH MILLER        ROBERT JIM     CS                 ASST GENERAL MANAGER 

I want the output to be name1 name2 and position how can i do it using split/regex/trim/etc and without using CPAN modules?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

时光暖心i 2024-09-23 00:11:02

这将取决于这些字段是否是固定长度字段,或者是否是制表符分隔的字段。最简单的(使用拆分)是如果它们是制表符分隔的。

my ($name1, $name2, $deptName, $position) = split("\t", $string);

如果它们是固定长度,并且假设它们都是 10 个字符长,那么您可以像这样解析它

my ($name1, $name2, $deptName, $position) = unpack("A10 A10 A10 A10", $string);

It's going to depend on whether those are fixed length fields, or if they are tab separated. The easiest (using split) is if they are tab separated.

my ($name1, $name2, $deptName, $position) = split("\t", $string);

If they're fixed length, and assuming they are all, say, 10 characters long, you can parse it like

my ($name1, $name2, $deptName, $position) = unpack("A10 A10 A10 A10", $string);
笛声青案梦长安 2024-09-23 00:11:02

如果您的输入数据作为字符串数组 (@strings) 出现,这

for my $s (@strings) {
   my $output = join ' ',
                map /^\s*(.+)\s*$/ ? $1 : (),
                unpack('A19 A15 x19 A*', $s);
   print "$output\n"
}

将提取并修剪所需的信息。

姓名1 |姓名2 |位置

约翰·米勒 |罗伯特·吉姆 |助理总经理

(“|”是我为了更好地解释结果而添加的)

问候

rbo

If your input data comes in as an array of strings (@strings), this

for my $s (@strings) {
   my $output = join ' ',
                map /^\s*(.+)\s*$/ ? $1 : (),
                unpack('A19 A15 x19 A*', $s);
   print "$output\n"
}

would extract and trim the information needed.

NAME1 | NAME2 | POSITION

and

JONH MILLER | ROBERT JIM | ASST GENERAL MANAGER

(The '|' were included by me for better expalnation of the result)

Regards

rbo

一个人练习一个人 2024-09-23 00:11:02

假设字段之间的空格不固定,因此在两个或多个空格的基础上分割字符串,这样就不会像 JONH MILLER 这样将名称分成两部分

#!/usr/bin/perl
use strict;
use warning;
my $string = "NAME1              NAME2          DEPTNAME           POSITION
             JONH MILLER        ROBERT JIM     CS                 ASST GENERAL MANAGER ";
my @string_parts = split /\s\s+/, $string;
foreach my $test (@string_parts){  
      print"$test\n";
}

Assuming that space between the fields are not fixed so split string on the basis of two or more spaces so that it will not break the Name like JONH MILLER into two parts.

#!/usr/bin/perl
use strict;
use warning;
my $string = "NAME1              NAME2          DEPTNAME           POSITION
             JONH MILLER        ROBERT JIM     CS                 ASST GENERAL MANAGER ";
my @string_parts = split /\s\s+/, $string;
foreach my $test (@string_parts){  
      print"$test\n";
}
中二柚 2024-09-23 00:11:02

从那里的样本来看,单个空格属于数据,但 2 个或更多连续空格则不属于数据。因此您可以轻松地分割成 2 个或更多空间。我添加的唯一内容是使用 List::MoreUtils: :网格

use List::MoreUtils qw<mesh>;
my @names   = map { chomp; $_ } split /\s{2,}/, <$file>;
my @records = map { chomp; { mesh( @names, @{[ split /\s{2,}/ ]} ) } } <$file>;

From the sample there, a single space belongs in the data, but 2 or more contiguous spaces do not. So you can easily split on 2 or more spaces. The only thing I add to this is the use of List::MoreUtils::mesh

use List::MoreUtils qw<mesh>;
my @names   = map { chomp; $_ } split /\s{2,}/, <$file>;
my @records = map { chomp; { mesh( @names, @{[ split /\s{2,}/ ]} ) } } <$file>;
梦巷 2024-09-23 00:11:02

考虑在命令行中的 Perl 单行中使用自动拆分:

$ perl -F/\s{2,}/ -ane 'print qq/@F[0,1,3]\n/' file

单行将拆分为两个或多个连续空格并打印第一个、第二个和第四个字段,对应于 NAME1、NAME2 和 POSITION 字段。

当然,如果您只有一个空格分隔 NAME1 和 NAME2 条目,则这种情况将会中断,但需要有关您的文件的更多信息才能确定最佳操作方案。

Consider using autosplit in a Perl one-liner from your command line:

$ perl -F/\s{2,}/ -ane 'print qq/@F[0,1,3]\n/' file

The one-liner will split on two or more consecutive spaces and print the first, second and fourth fields, corresponding to NAME1, NAME2 and POSITION fields.

Of course, this will break if you have only a single space separating NAME1 and NAME2 entries, but more information is needed about your file in order to ascertain what the best course of action might be.

梦醒灬来后我 2024-09-23 00:11:02

按空格分割:

@string_parts = split /\s{2,}/, $string;

这会将 $string 分割成子字符串列表。分隔符将是正则表达式 \s+,这意味着一个或多个空白字符。这包括空格、制表符和(除非我弄错了)换行符。

编辑:我发现要求之一不是仅分割一个空间,而是分割两个或多个空间。我相应地修改了正则表达式。

To split on whitespace:

@string_parts = split /\s{2,}/, $string;

This will split $string into a list of substrings. The separator will be the regex \s+, which means one or more whitespace characters. This includes spaces, tabs, and (unless I'm mistaken) newlines.

Edit: I see that one of the requirements is not to split on only one space, but to split on two or more. I modified the regex accordingly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文