使用 Perl,如何从具有两个可能的记录分隔符的文件中读取记录?

发布于 2024-08-21 10:03:17 字数 316 浏览 6 评论 0原文

这就是我想要做的:

我想将文本文件读入字符串数组。我希望当文件读取某个字符(主要是 ;|)时字符串终止。

例如,以下文本

Would you; please
hand me| my coat?

将像这样存放:

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';

我可以得到一些关于这样的事情的帮助吗?

Here is what I am trying to do:

I want to read a text file into an array of strings. I want the string to terminate when the file reads in a certain character (mainly ; or |).

For example, the following text

Would you; please
hand me| my coat?

would be put away like this:

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';

Could I get some help on something like this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

滿滿的愛 2024-08-28 10:03:18

这样就可以了。在保留要拆分的标记的同时使用 split 的技巧是使用零宽度回溯匹配:split(/(?<=[;|])/, ...)

注意:mctylr 的答案(目前评价最高)实际上并不正确——它会在换行符上分割字段,因为它一次只能在文件的一行上工作。

gbacon 使用输入记录分隔符($/)的答案非常聪明——它既节省空间又节省时间——但我认为我不想在生产代码中看到它。将一个分割令牌放在记录分隔符中,将另一个放在分割中,这让我觉得有点太不明显了(你必须用 Perl 来解决这个问题......),这将使其难以维护。我也不确定为什么他要删除多个换行符(我认为你没有要求?)以及为什么他只对以“|”结尾的记录的末尾这样做。

# open file for reading, die with error message if it fails
open(my $fh, '<', 'data.txt') || die $!; 

# set file reading to slurp (whole file) mode (note that this affects all 
# file reads in this block)
local $/ = undef; 

my $string = <$fh>; 

# convert all newlines into spaces, not specified but as per example output
$string =~ s/\n/ /g; 

# split string on ; or |, using a zero-width lookback match (?<=) to preserve char
my (@strings) = split(/(?<=[;|])/, $string); 

This will do it. The trick to using split while preserving the token you're splitting on is to use a zero-width lookback match: split(/(?<=[;|])/, ...).

Note: mctylr's answer (currently the top rated) isn't actually correct -- it will split fields on newlines, b/c it only works on a single line of the file at a time.

gbacon's answer using the input record separator ($/) is quite clever--it's both space and time efficient--but I don't think I'd want to see it in production code. Putting one split token in the record separator and the other in the split strikes me as a little too unobvious (you have to fight that with Perl ...) which will make it hard to maintain. I'm also not sure why he's deleting multiple newlines (which I don't think you asked for?) and why he's doing that only for the end of '|'-terminated records.

# open file for reading, die with error message if it fails
open(my $fh, '<', 'data.txt') || die $!; 

# set file reading to slurp (whole file) mode (note that this affects all 
# file reads in this block)
local $/ = undef; 

my $string = <$fh>; 

# convert all newlines into spaces, not specified but as per example output
$string =~ s/\n/ /g; 

# split string on ; or |, using a zero-width lookback match (?<=) to preserve char
my (@strings) = split(/(?<=[;|])/, $string); 
一杆小烟枪 2024-08-28 10:03:18

一种方法是注入另一个字符,例如 \n,每当找到特殊字符时,然后 \n 上的“nofollow noreferrer">split

use warnings;
use strict;
use Data::Dumper;

while (<DATA>) {
    chomp;
    s/([;|])/$1\n/g;
    my @string = split /\n/;
    print Dumper(\@string);
}

__DATA__
Would you; please hand me| my coat?

打印出:

$VAR1 = [
          'Would you;',
          ' please hand me|',
          ' my coat?'
        ];

更新:James 提出的原始问题将输入文本显示在一行上,如 __DATA__< 所示/代码> 上面。由于问题的格式很糟糕,其他人编辑了问题,将 1 行分成了 2 行。只有 James 知道 1 行还是 2 行是有意的。

One way is to inject another character, like \n, whenever your special character is found, then split on the \n:

use warnings;
use strict;
use Data::Dumper;

while (<DATA>) {
    chomp;
    s/([;|])/$1\n/g;
    my @string = split /\n/;
    print Dumper(\@string);
}

__DATA__
Would you; please hand me| my coat?

Prints out:

$VAR1 = [
          'Would you;',
          ' please hand me|',
          ' my coat?'
        ];

UPDATE: The original question posed by James showed the input text on a single line, as shown in __DATA__ above. Because the question was poorly formatted, others edited the question, breaking the 1 line into 2. Only James knows whether 1 or 2 lines was intended.

冷心人i 2024-08-28 10:03:18

我更喜欢 @toolic 的答案,因为它非常适合处理多个分隔符容易地。

但是,如果您想让事情变得过于复杂,您可以随时尝试:

#!/usr/bin/perl

use strict; use warnings;

my @contents = ('');

while ( my $line = <DATA> ) {
    last unless $line =~ /\S/;
    $line =~ s{$/}{ };
    if ( $line =~ /^([^|;]+[|;])(.+)$/ ) {
        $contents[-1] .= $1;
        push @contents, $2;
    }
    else {
        $contents[-1] .= $1;
    }
}

print "[$_]\n" for @contents;

__DATA__
Would you; please
hand me| my coat?

I prefer @toolic's answer because it deals with multiple separators very easily.

However, if you wanted to overly complicate things, you could always try:

#!/usr/bin/perl

use strict; use warnings;

my @contents = ('');

while ( my $line = <DATA> ) {
    last unless $line =~ /\S/;
    $line =~ s{$/}{ };
    if ( $line =~ /^([^|;]+[|;])(.+)$/ ) {
        $contents[-1] .= $1;
        push @contents, $2;
    }
    else {
        $contents[-1] .= $1;
    }
}

print "[$_]\n" for @contents;

__DATA__
Would you; please
hand me| my coat?
紫﹏色ふ单纯 2024-08-28 10:03:18

类似的东西或多或少

$text = <INPUTFILE>;

@string = split(/[;!]/, $text);

应该可以达到目的。

编辑:我已将“/;!/”更改为“/[;!]/”。

Something along the lines of

$text = <INPUTFILE>;

@string = split(/[;!]/, $text);

should do the trick more or less.

Edit: I've changed "/;!/" to "/[;!]/".

残月升风 2024-08-28 10:03:18

通过设置 $/(输入记录分隔符)改为竖线,然后提取分号分隔的字段:

#!/usr/bin/perl

use warnings;
use strict;

my @string;

*ARGV = *DATA;

$/ = "|";
while (<>) {
  s/\n+$//;
  s/\n/ /g;
  push @string => $1 while s/^(.*;)//;
  push @string => $_;
}

for (my $i = 0; $i < @string; ++$i) {
  print "\$string[$i] = '$string[$i]';\n";
}

__DATA__
Would you; please
hand me| my coat?

输出:

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';

Let Perl do half the work for you by setting $/ (the input record separator) to vertical bar, and then extract semicolon-separated fields:

#!/usr/bin/perl

use warnings;
use strict;

my @string;

*ARGV = *DATA;

$/ = "|";
while (<>) {
  s/\n+$//;
  s/\n/ /g;
  push @string => $1 while s/^(.*;)//;
  push @string => $_;
}

for (my $i = 0; $i < @string; ++$i) {
  print "\$string[$i] = '$string[$i]';\n";
}

__DATA__
Would you; please
hand me| my coat?

Output:

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文