在 Perl 中,如何只分割字符串的某个前导部分?

发布于 2024-12-11 17:15:56 字数 132 浏览 0 评论 0原文

我正在解析一个包含长行的文件,其标记以空格分隔。在处理大部分行之前,我想检查第 n 个(对于小 n)标记是否具有某些值。我将跳过大部分行,所以实际上没有必要分割大部分很长的行。有没有一种快速的方法可以在 Perl 中进行惰性分割,或者我需要自己动手?

I am parsing a file with long lines, whose tokens are white space delimited. Before handling most of the line, I want to check whether the n-th (for small n) token has some value. I'll skip most of the lines, so really there's no need to split most of the very long lines. Is there a quick way to do a lazy split in Perl or do I need to roll my own?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

太阳哥哥 2024-12-18 17:15:56

您可以向 split 运算符提供 limit 参数,以使 Perl 在生成一定数量的令牌后停止拆分。

@fields = split /\s+/, $expression, 4

例如,会将所有内容放在 @list 的第四个元素中的第三个空格分隔字段之后。当表达式具有四个以上字段时,这比执行完全拆分更有效。

如果您执行此惰性拆分并决定需要进一步处理该行,则需要再次拆分该行。根据生产线的长度以及您需要重新处理它们的频率,您仍然可以领先。


另一种方法可能是分割您感兴趣的行的一部分。例如,如果该行包含许多字段,但您想过滤第 4 个字段,并且您确定第 4 个字段始终出现在第 100 个字节之前行,说

@fields = split /\s+/, substr($expression, 0, 100);
if (matches_some_condition($line[3])) {
    # process the whole line
    @fields = split /\s+/, $expression;
    ...
}

偶尔将表达式拆分两次可能比总是拆分完整表达式一次更有效。

You can provide a limit argument to the split operator to make Perl stop splitting after a certain number of tokens have been generated.

@fields = split /\s+/, $expression, 4

for example, will put everything after the 3rd whitespace-separated field in the 4th element of @list. This is more efficient than doing a complete split when the expression has more than four fields.

If you do this lazy split and decide that you need to process the line further, you will need to split the line again. Depending on how long the lines are and how frequently you need to reprocess them, you could still come out ahead.


Another approach may be to split a portion of the line you are interested in. For example, if the line contains many fields but you want to filter on the 4th field AND you are sure that the 4th field always occurs before the 100th byte on the line, saying

@fields = split /\s+/, substr($expression, 0, 100);
if (matches_some_condition($line[3])) {
    # process the whole line
    @fields = split /\s+/, $expression;
    ...
}

and occasionally splitting the expression twice may be more efficient than always splitting the full expression one time.

泼猴你往哪里跑 2024-12-18 17:15:56

perldoc -f split

如果指定了 LIMIT 并且为正数,则表示 EXPR 将被分割成的最大字段数,但实际返回的字段数取决于 EXPR 中 PATTERN 匹配的次数。

my $nth = (split ' ', $line, $n + 1)[$n - 1];

perldoc -f split:

If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR.

my $nth = (split ' ', $line, $n + 1)[$n - 1];
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文