在 Perl 中,如何只分割字符串的某个前导部分?
我正在解析一个包含长行的文件,其标记以空格分隔。在处理大部分行之前,我想检查第 n 个(对于小 n)标记是否具有某些值。我将跳过大部分行,所以实际上没有必要分割大部分很长的行。有没有一种快速的方法可以在 Perl 中进行惰性分割,或者我需要自己动手?
I am parsing a file with long lines, whose tokens are white space delimited. Before handling most of the line, I want to check whether the n-th (for small n) token has some value. I'll skip most of the lines, so really there's no need to split most of the very long lines. Is there a quick way to do a lazy split in Perl or do I need to roll my own?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以向 split 运算符提供 limit 参数,以使 Perl 在生成一定数量的令牌后停止拆分。
例如,会将所有内容放在
@list
的第四个元素中的第三个空格分隔字段之后。当表达式具有四个以上字段时,这比执行完全拆分更有效。如果您执行此惰性拆分并决定需要进一步处理该行,则需要再次
拆分
该行。根据生产线的长度以及您需要重新处理它们的频率,您仍然可以领先。另一种方法可能是分割您感兴趣的行的一部分。例如,如果该行包含许多字段,但您想过滤第 4 个字段,并且您确定第 4 个字段始终出现在第 100 个字节之前行,说
偶尔将表达式拆分两次可能比总是拆分完整表达式一次更有效。
You can provide a limit argument to the
split
operator to make Perl stop splitting after a certain number of tokens have been generated.for example, will put everything after the 3rd whitespace-separated field in the 4th element of
@list
. This is more efficient than doing a complete split when the expression has more than four fields.If you do this lazy split and decide that you need to process the line further, you will need to
split
the line again. Depending on how long the lines are and how frequently you need to reprocess them, you could still come out ahead.Another approach may be to split a portion of the line you are interested in. For example, if the line contains many fields but you want to filter on the 4th field AND you are sure that the 4th field always occurs before the 100th byte on the line, saying
and occasionally splitting the expression twice may be more efficient than always splitting the full expression one time.
perldoc -f split:
perldoc -f split: