如何使用 Perl 分割一行文本?

发布于 2024-12-18 22:46:36 字数 1159 浏览 0 评论 0原文

可能的重复:
在冒号后连接行 (perl)

可能有下一行,如下所示:

red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)

此行可以包含更多字符,例如以下任何一个:

xxx: yyyy
xxx: yyyy, zzzz
xxx: yyyy (zzz) yyyyy (xx)

我想根据以下标准分割这一行:

输入的部分“黄色:alpha(gamma)beta(alpha)gamma(beta)”被分配为“yellow” :阿尔法(gamma)”、“黄色:beta (alpha)”、“黄色:gamma (beta)”。

找到“单词后跟冒号”并将其添加为新行的缩进,如果“单词后跟冒号”后跟一个不包含冒号的单词,则生成一行,如果“单词后跟冒号”后跟两个单词,则生成两行(可能是逗号分隔的)不包含冒号的单词。如果“后跟冒号的单词”之后的第二个单词带有括号,则括号内的信息与其前面的单词位于同一行。

示例 1:

aa: bb ccc

分割

aa: bb
aa: ccc

示例 2:

aa: bb, ccc ddd: aa eee ff

分割

aa: bb
aa: ccc
ddd: aa
ddd: eee
ddd: aa

原始

对于原始示例输入,输出应为:

red: alpha
green: beta
green: gamma
blue: alpha
blue: beta
yellow: alpha (gamma)
yellow: beta (alpha)
yellow: gamma (beta)

Possible Duplicate:
join lines after colon (perl)

There may be the next line like this :

red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)

This line can contain more characters such as any of these:

xxx: yyyy
xxx: yyyy, zzzz
xxx: yyyy (zzz) yyyyy (xx)

I want to split this line according to the following criteria:

The part of input which reads "yellow: alpha (gamma) beta (alpha) gamma (beta)" is being distributed as "yellow: alpha (gamma)", "yellow: beta (alpha)", "yellow: gamma (beta)".

Find "word followed by colon" and add this as indent of new line, generating one line if "word followed by colon" is followed by one word that does not contain acolon, two lines if "word followed by colon" is followed by two (possibly comma-separated) words that do not contain a colon. If the second word after "word followed by a colon" is parenthesized, then the parenthesized information belong on a line with the word preceding it.

Example 1:

line

aa: bb ccc

split

aa: bb
aa: ccc

Example 2:

line

aa: bb, ccc ddd: aa eee ff

split

aa: bb
aa: ccc
ddd: aa
ddd: eee
ddd: aa

Original

For the original example input, the output should be:

red: alpha
green: beta
green: gamma
blue: alpha
blue: beta
yellow: alpha (gamma)
yellow: beta (alpha)
yellow: gamma (beta)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

一抹淡然 2024-12-25 22:46:36
use strict;
use warnings;
my $line = 'red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)';
my @tmp = split /\s*?(\w+):\s*/, $line;
shift @tmp;
while (my ($color, $value) = splice @tmp, 0, 2) {
    foreach my $v (split /, | (?!\()/, $value) {
        print "$color: $v\n";
    }
}
use strict;
use warnings;
my $line = 'red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)';
my @tmp = split /\s*?(\w+):\s*/, $line;
shift @tmp;
while (my ($color, $value) = splice @tmp, 0, 2) {
    foreach my $v (split /, | (?!\()/, $value) {
        print "$color: $v\n";
    }
}
無處可尋 2024-12-25 22:46:36
#!/usr/bin/env perl
use strict;
use warnings;
my @toplevels;
while (<DATA>) {
    chomp;
    @toplevels = split /(?=\w+:)/;
}
for my $chunk (@toplevels) {
    my ($color, $line) = ( $chunk =~ /(^\w+:)(.+)/ );
    my @line = split /[,)]/, $line;
    for (@line) {
        printf "%s%s%s\n", $color, $_, m/\(/ ? ')' : '';
    }
}
__DATA__
red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)
#!/usr/bin/env perl
use strict;
use warnings;
my @toplevels;
while (<DATA>) {
    chomp;
    @toplevels = split /(?=\w+:)/;
}
for my $chunk (@toplevels) {
    my ($color, $line) = ( $chunk =~ /(^\w+:)(.+)/ );
    my @line = split /[,)]/, $line;
    for (@line) {
        printf "%s%s%s\n", $color, $_, m/\(/ ? ')' : '';
    }
}
__DATA__
red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)
绅士风度i 2024-12-25 22:46:36
use strict;
use warnings;
use v5.10;

while (<DATA>) {
    for my $unit (/[a-z]+:\s*[a-z, ()]+\s+(?=[a-z]+:)?/g) {
        if ($unit =~ /^([a-z]+:)\s*(.+)$/) {
            my $key = $1;
            my @val = split /[, ]+(?!\()/, $2;
            say "$key $_" for @val;
        }
    }
}

__DATA__
red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)
use strict;
use warnings;
use v5.10;

while (<DATA>) {
    for my $unit (/[a-z]+:\s*[a-z, ()]+\s+(?=[a-z]+:)?/g) {
        if ($unit =~ /^([a-z]+:)\s*(.+)$/) {
            my $key = $1;
            my @val = split /[, ]+(?!\()/, $2;
            say "$key $_" for @val;
        }
    }
}

__DATA__
red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)
原野 2024-12-25 22:46:36

您可以执行以下操作:

#!/usr/bin/perl -w

use strict;
use warnings;

my $string = "red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)";

for my $key_values (split/(?=\w+:)/, $string) {
    my ($key, $values) = split/: /, $key_values;
    for my $value (split/, |(?<=\)) | (?!\()/, $values) {
        print "$key: $value\n";
    }
}

高尔夫版本:

map{s/(.+: )//;map{print"$1$_\n"}split/, |(?<=\)) | (?!\()/}split/(?=\w+:)/,$string;

编辑: 我忽略了其中一项“要求”,因此我必须更新第三个正则表达式。

You could do something like this:

#!/usr/bin/perl -w

use strict;
use warnings;

my $string = "red: alpha green: beta, gamma blue: alpha, beta yellow: alpha (gamma) beta (alpha) gamma (beta)";

for my $key_values (split/(?=\w+:)/, $string) {
    my ($key, $values) = split/: /, $key_values;
    for my $value (split/, |(?<=\)) | (?!\()/, $values) {
        print "$key: $value\n";
    }
}

Golfed version:

map{s/(.+: )//;map{print"$1$_\n"}split/, |(?<=\)) | (?!\()/}split/(?=\w+:)/,$string;

EDIT: I overlooked one of the "requirements" so I had to update the third regex.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文