在 Perl 中匹配和替换多个单词时如何保留空格？

发布于 2024-08-04 22:53:33 字数 624 浏览 8 评论 0原文

假设我有一些原始文本：

here is some text that has a substring that I'm interested in embedded in it.

我需要文本匹配其中的一部分，例如：“有一个子字符串”。

但是，原始文本和匹配字符串可能存在空格差异。例如，匹配文本可能是：

has a
substring

或

has  a substring

和/或原始文本可能是：

here is some
text that has
a substring that I'm interested in embedded in it.

我需要我的程序输出的是：

here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.

我还需要保留原始文本中的空白模式，只需向其中添加开始和结束标记。

关于使用 Perl 正则表达式来实现这一点的方法有什么想法吗？我尝试过，但最终变得非常困惑。

原文

Let's say I have some original text:

here is some text that has a substring that I'm interested in embedded in it.

I need the text to match a part of it, say: "has a substring".

However, the original text and the matching string may have whitespace differences. For example the match text might be:

has a
substring

has  a substring

and/or the original text might be:

here is some
text that has
a substring that I'm interested in embedded in it.

What I need my program to output is:

here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.

I also need to preserve the whitespace pattern in the original and just add the start and end markers to it.

Any ideas about a way of using Perl regexes to get this to happen? I tried, but ended up getting horribly confused.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

高跟鞋的旋律 2024-08-11 22:53:33

自从我使用 perl 正则表达式以来已经有一段时间了，但是怎么样：

$match = s/(has\s+a\s+substring)/[$1]/ig

这将捕获单词之间的零个或多个空格和换行符。它将用括号包裹整个匹配，同时保持原始分隔。它不是自动的，但确实有效。

您可以用它玩游戏，例如获取字符串“has a substring”并对其进行转换以使其成为“has\s*a\s*substring”为了减轻一点痛苦。

编辑：合并了 ysth 的评论，即 \s 元字符与换行符匹配，并对我的 \s 用法进行了霍布斯更正。

Been some time since I've used perl regular expressions, but what about:

$match = s/(has\s+a\s+substring)/[$1]/ig

This would capture zero or more whitespace and newline characters between the words. It will wrap the entire match with brackets while maintaining the original separation. It ain't automatic, but it does work.

You could play games with this, like taking the string "has a substring" and doing a transform on it to make it "has\s*a\s*substring" to make this a little less painful.

EDIT: Incorporated ysth's comments that the \s metacharacter matches newlines and hobbs corrections to my \s usage.

回复收藏 0 原文

╄→承喏 2024-08-11 22:53:33

此模式将与您要查找的字符串匹配：

(has\s+a\s+substring)

因此，当用户输入搜索字符串时，用 \s+ 替换搜索字符串中的任何空格，您就得到了您的模式。只需将每个匹配项替换为 [matchstartshere]$1[matchendhere]，其中 $1 是匹配的文本。

This pattern will match the string that you're looking to find:

(has\s+a\s+substring)

So, when the user enters a search string, replace any whitespace in the search string with \s+ and you have your pattern. The, just replace every match with [match starts here]$1[match ends here] where $1 is the matched text.

回复收藏 0 原文

嘿哥们儿 2024-08-11 22:53:33

在正则表达式中，您可以使用 + 来表示“一个或多个”。因此，类似这样的内容

/has\s+a\s+substring/

匹配 has 后跟一个或多个空白字符，后跟 a 后跟一个或多个空白字符，后跟 substring。

将其与替换运算符放在一起，您可以说：

my $str = "here is some text that has     a  substring that I'm interested in embedded in it.";
$str =~ s/(has\s+a\s+substring)/\[match starts here]$1\[match ends here]/gs;

print $str;

输出为：

here is some text that [match starts here]has     a  substring[match ends here] that I'm interested in embedded in it.

In regexes, you can use + to mean "one or more." So something like this

/has\s+a\s+substring/

matches has followed by one or more whitespace chars, followed by a followed by one or more whitespace chars, followed by substring.

Putting it together with a substitution operator, you can say:

my $str = "here is some text that has     a  substring that I'm interested in embedded in it.";
$str =~ s/(has\s+a\s+substring)/\[match starts here]$1\[match ends here]/gs;

print $str;

And the output is:

here is some text that [match starts here]has     a  substring[match ends here] that I'm interested in embedded in it.

回复收藏 0 原文

二智少女猫性小仙女 2024-08-11 22:53:33

许多人建议使用 \s+ 来匹配空格。以下是自动执行此操作的方法：

my $original = "here is some text that has a substring that I'm interested in embedded in it.";
my $search = "has a\nsubstring";

my $re = $search;
$re =~ s/\s+/\\s+/g;

$original =~ s/\b$re\b/[match starts here]amp;[match ends here]/g;

print $original;

输出：

这里有一些文本，[匹配从这里开始]有一个子字符串[匹配在这里结束]，我有兴趣嵌入其中。

您可能想要转义字符串中的任何元字符。如果有人感兴趣，我可以添加它。

A many has suggested, use \s+ to match whitespace. Here is how you do it automaticly:

my $original = "here is some text that has a substring that I'm interested in embedded in it.";
my $search = "has a\nsubstring";

my $re = $search;
$re =~ s/\s+/\\s+/g;

$original =~ s/\b$re\b/[match starts here]amp;[match ends here]/g;

print $original;

Output:

here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.

You might want to escape any meta-characters in the string. If someone is interested, I could add it.

回复收藏 0 原文

め七分饶幸 2024-08-11 22:53:33

这是一个如何做到这一点的示例。

#! /opt/perl/bin/perl
use strict;
use warnings;

my $submatch = "has a\nsubstring";

my $str = "
here is some
text that has
a substring that I'm interested in, embedded in it.
";

print substr_match($str, $submatch), "\n";

sub substr_match{
  my($string,$match) = @_;

  $match =~ s/\s+/\\s+/g;

  # This isn't safe the way it is now, you will need to sanitize $match
  $string =~ /\b$match\b/;
}

目前，它会执行任何操作来检查 $match 变量中是否存在不安全字符。

This is an example of how you could do that.

#! /opt/perl/bin/perl
use strict;
use warnings;

my $submatch = "has a\nsubstring";

my $str = "
here is some
text that has
a substring that I'm interested in, embedded in it.
";

print substr_match($str, $submatch), "\n";

sub substr_match{
  my($string,$match) = @_;

  $match =~ s/\s+/\\s+/g;

  # This isn't safe the way it is now, you will need to sanitize $match
  $string =~ /\b$match\b/;
}

This currently does anything to check the $match variable for unsafe characters.

回复收藏 0 原文

~没有更多了~