将字符串拆分为标记并在 Perl 中存储分隔符
我有一个像这样的字符串:
a b c d
我像这样处理我的字符串:
chomp $line;
my @tokens = split /\s+/, $line;
my @new_tokens;
foreach my $token (@tokens) {
push @new_tokens, some_complex_function( $token );
}
my $new_str = join ' ', @tokens;
我想用原始空格重新连接该字符串。有什么方法可以存储拆分后的空白并在以后重新使用它吗?或者这将是一个巨大的痛苦?它主要是装饰性的,但我想保留输入字符串中的原始空格。
I have a string like this:
a b c d
I process my string like this:
chomp $line;
my @tokens = split /\s+/, $line;
my @new_tokens;
foreach my $token (@tokens) {
push @new_tokens, some_complex_function( $token );
}
my $new_str = join ' ', @tokens;
I'd like to re-join the string with the original whitespace. Is there some way that I can store the whitespace from split and re-use it later? Or is this going to be a huge pain? It's mostly cosmetic, but I'd like to preserve the original spaces from the input string.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您使用带有捕获括号的正则表达式进行拆分,则拆分模式将包含在结果列表中(请参阅 perldoc -f 分割):
If you split with a regex with capturing parentheses, the split pattern will be included in the resulting list (see perldoc -f split):
只需在单词边界上进行拆分:
对于您的示例,这将给出:
编辑: 正如 brian d foy 指出的那样,
\b
使用了错误的字符类,遵循我最初的想法,我想出了使用环视断言。不过,这看起来比 Ether 的答案要复杂得多:Just split on word boundaries:
For your example, this will give:
EDIT: As brian d foy pointed out,
\b
uses the wrong character classes, Following my original idea, I came up with using look-around assertions. This looks way more complicated than Ether's answer, though:你为什么不简单地这样做:
my $new_str = uc( $line );
?更新 - 原来的 uc() 只是“更复杂的函数”的简写。
嗯,一般来说你还可以:
Why don't you simply do:
my $new_str = uc( $line );
?UPDATE - original uc() is just a shorthand for "more complex function".
Well, generally you can also: