使用 Perl，如何从具有两个可能的记录分隔符的文件中读取记录？

发布于 2024-08-21 10:03:17 字数 316 浏览 10 评论 0原文

这就是我想要做的：

我想将文本文件读入字符串数组。我希望当文件读取某个字符（主要是 ; 或 |）时字符串终止。

例如，以下文本

Would you; please
hand me| my coat?

将像这样存放：

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';

我可以得到一些关于这样的事情的帮助吗？

原文

Here is what I am trying to do:

I want to read a text file into an array of strings. I want the string to terminate when the file reads in a certain character (mainly ; or |).

For example, the following text

Would you; please
hand me| my coat?

would be put away like this:

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';

Could I get some help on something like this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

滿滿的愛 2024-08-28 10:03:18

这样就可以了。在保留要拆分的标记的同时使用 split 的技巧是使用零宽度回溯匹配：split(/(?<=[;|])/, ...) 。

注意：mctylr 的答案（目前评价最高）实际上并不正确——它会在换行符上分割字段，因为它一次只能在文件的一行上工作。

gbacon 使用输入记录分隔符（$/）的答案非常聪明——它既节省空间又节省时间——但我认为我不想在生产代码中看到它。将一个分割令牌放在记录分隔符中，将另一个放在分割中，这让我觉得有点太不明显了（你必须用 Perl 来解决这个问题......），这将使其难以维护。我也不确定为什么他要删除多个换行符（我认为你没有要求？）以及为什么他只对以“|”结尾的记录的末尾这样做。

# open file for reading, die with error message if it fails
open(my $fh, '<', 'data.txt') || die $!; 

# set file reading to slurp (whole file) mode (note that this affects all 
# file reads in this block)
local $/ = undef; 

my $string = <$fh>; 

# convert all newlines into spaces, not specified but as per example output
$string =~ s/\n/ /g; 

# split string on ; or |, using a zero-width lookback match (?<=) to preserve char
my (@strings) = split(/(?<=[;|])/, $string);

This will do it. The trick to using split while preserving the token you're splitting on is to use a zero-width lookback match: split(/(?<=[;|])/, ...).

Note: mctylr's answer (currently the top rated) isn't actually correct -- it will split fields on newlines, b/c it only works on a single line of the file at a time.

gbacon's answer using the input record separator ($/) is quite clever--it's both space and time efficient--but I don't think I'd want to see it in production code. Putting one split token in the record separator and the other in the split strikes me as a little too unobvious (you have to fight that with Perl ...) which will make it hard to maintain. I'm also not sure why he's deleting multiple newlines (which I don't think you asked for?) and why he's doing that only for the end of '|'-terminated records.

# open file for reading, die with error message if it fails
open(my $fh, '<', 'data.txt') || die $!; 

# set file reading to slurp (whole file) mode (note that this affects all 
# file reads in this block)
local $/ = undef; 

my $string = <$fh>; 

# convert all newlines into spaces, not specified but as per example output
$string =~ s/\n/ /g; 

# split string on ; or |, using a zero-width lookback match (?<=) to preserve char
my (@strings) = split(/(?<=[;|])/, $string);

回复收藏 0 原文

一杆小烟枪 2024-08-28 10:03:18

一种方法是注入另一个字符，例如 \n，每当找到特殊字符时，然后 \n 上的“nofollow noreferrer">split：

use warnings;
use strict;
use Data::Dumper;

while (<DATA>) {
    chomp;
    s/([;|])/$1\n/g;
    my @string = split /\n/;
    print Dumper(\@string);
}

__DATA__
Would you; please hand me| my coat?

打印出：

$VAR1 = [
          'Would you;',
          ' please hand me|',
          ' my coat?'
        ];

更新：James 提出的原始问题将输入文本显示在一行上，如 __DATA__< 所示/代码> 上面。由于问题的格式很糟糕，其他人编辑了问题，将 1 行分成了 2 行。只有 James 知道 1 行还是 2 行是有意的。

One way is to inject another character, like \n, whenever your special character is found, then split on the \n:

use warnings;
use strict;
use Data::Dumper;

while (<DATA>) {
    chomp;
    s/([;|])/$1\n/g;
    my @string = split /\n/;
    print Dumper(\@string);
}

__DATA__
Would you; please hand me| my coat?

Prints out:

$VAR1 = [
          'Would you;',
          ' please hand me|',
          ' my coat?'
        ];

UPDATE: The original question posed by James showed the input text on a single line, as shown in __DATA__ above. Because the question was poorly formatted, others edited the question, breaking the 1 line into 2. Only James knows whether 1 or 2 lines was intended.

回复收藏 0 原文

冷心人i 2024-08-28 10:03:18

我更喜欢 @toolic 的答案，因为它非常适合处理多个分隔符容易地。

但是，如果您想让事情变得过于复杂，您可以随时尝试：

#!/usr/bin/perl

use strict; use warnings;

my @contents = ('');

while ( my $line = <DATA> ) {
    last unless $line =~ /\S/;
    $line =~ s{$/}{ };
    if ( $line =~ /^([^|;]+[|;])(.+)$/ ) {
        $contents[-1] .= $1;
        push @contents, $2;
    }
    else {
        $contents[-1] .= $1;
    }
}

print "[$_]\n" for @contents;

__DATA__
Would you; please
hand me| my coat?

I prefer @toolic's answer because it deals with multiple separators very easily.

However, if you wanted to overly complicate things, you could always try:

#!/usr/bin/perl

use strict; use warnings;

my @contents = ('');

while ( my $line = <DATA> ) {
    last unless $line =~ /\S/;
    $line =~ s{$/}{ };
    if ( $line =~ /^([^|;]+[|;])(.+)$/ ) {
        $contents[-1] .= $1;
        push @contents, $2;
    }
    else {
        $contents[-1] .= $1;
    }
}

print "[$_]\n" for @contents;

__DATA__
Would you; please
hand me| my coat?

回复收藏 0 原文

紫﹏色ふ单纯 2024-08-28 10:03:18

类似的东西或多或少

$text = <INPUTFILE>;

@string = split(/[;!]/, $text);

应该可以达到目的。

编辑：我已将“/;!/”更改为“/[;!]/”。

Something along the lines of

$text = <INPUTFILE>;

@string = split(/[;!]/, $text);

should do the trick more or less.

Edit: I've changed "/;!/" to "/[;!]/".

回复收藏 0 原文

残月升风 2024-08-28 10:03:18

通过设置 $/（输入记录分隔符）改为竖线，然后提取分号分隔的字段：

#!/usr/bin/perl

use warnings;
use strict;

my @string;

*ARGV = *DATA;

$/ = "|";
while (<>) {
  s/\n+$//;
  s/\n/ /g;
  push @string => $1 while s/^(.*;)//;
  push @string => $_;
}

for (my $i = 0; $i < @string; ++$i) {
  print "\$string[$i] = '$string[$i]';\n";
}

__DATA__
Would you; please
hand me| my coat?

输出：

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';

Let Perl do half the work for you by setting $/ (the input record separator) to vertical bar, and then extract semicolon-separated fields:

#!/usr/bin/perl

use warnings;
use strict;

my @string;

*ARGV = *DATA;

$/ = "|";
while (<>) {
  s/\n+$//;
  s/\n/ /g;
  push @string => $1 while s/^(.*;)//;
  push @string => $_;
}

for (my $i = 0; $i < @string; ++$i) {
  print "\$string[$i] = '$string[$i]';\n";
}

__DATA__
Would you; please
hand me| my coat?

Output:

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';

回复收藏 0 原文

~没有更多了~

关于作者

≈。彩虹

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

使用 Perl，如何从具有两个可能的记录分隔符的文件中读取记录？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚守退让之实

小兔几

mb_3y7WUgWY

友情链接

使用 Perl，如何从具有两个可能的记录分隔符的文件中读取记录？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚 守退让之实

小兔几

mb_3y7WUgWY

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

秉忠贞之诚守退让之实