在 Perl 中匹配和替换多个单词时如何保留空格?
假设我有一些原始文本:
here is some text that has a substring that I'm interested in embedded in it.
我需要文本匹配其中的一部分,例如:“有一个子字符串
”。
但是,原始文本和匹配字符串可能存在空格差异。例如,匹配文本可能是:
has a substring
或
has a substring
和/或原始文本可能是:
here is some text that has a substring that I'm interested in embedded in it.
我需要我的程序输出的是:
here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.
我还需要保留原始文本中的空白模式,只需向其中添加开始和结束标记。
关于使用 Perl 正则表达式来实现这一点的方法有什么想法吗?我尝试过,但最终变得非常困惑。
Let's say I have some original text:
here is some text that has a substring that I'm interested in embedded in it.
I need the text to match a part of it, say: "has a substring
".
However, the original text and the matching string may have whitespace differences. For example the match text might be:
has a substring
or
has a substring
and/or the original text might be:
here is some text that has a substring that I'm interested in embedded in it.
What I need my program to output is:
here is some text that [match starts here]has a substring[match ends here] that I'm interested in embedded in it.
I also need to preserve the whitespace pattern in the original and just add the start and end markers to it.
Any ideas about a way of using Perl regexes to get this to happen? I tried, but ended up getting horribly confused.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
自从我使用 perl 正则表达式以来已经有一段时间了,但是怎么样:
这将捕获单词之间的零个或多个空格和换行符。它将用括号包裹整个匹配,同时保持原始分隔。它不是自动的,但确实有效。
您可以用它玩游戏,例如获取字符串“has a substring”并对其进行转换以使其成为“has\s*a\s*substring”为了减轻一点痛苦。
编辑:合并了 ysth 的评论,即 \s 元字符与换行符匹配,并对我的 \s 用法进行了霍布斯更正。
Been some time since I've used perl regular expressions, but what about:
This would capture zero or more whitespace and newline characters between the words. It will wrap the entire match with brackets while maintaining the original separation. It ain't automatic, but it does work.
You could play games with this, like taking the string
"has a substring"
and doing a transform on it to make it"has\s*a\s*substring"
to make this a little less painful.EDIT: Incorporated ysth's comments that the \s metacharacter matches newlines and hobbs corrections to my \s usage.
此模式将与您要查找的字符串匹配:
因此,当用户输入搜索字符串时,用
\s+
替换搜索字符串中的任何空格,您就得到了您的模式。只需将每个匹配项替换为[matchstartshere]$1[matchendhere]
,其中$1
是匹配的文本。This pattern will match the string that you're looking to find:
So, when the user enters a search string, replace any whitespace in the search string with
\s+
and you have your pattern. The, just replace every match with[match starts here]$1[match ends here]
where$1
is the matched text.在正则表达式中,您可以使用
+
来表示“一个或多个”。因此,类似这样的内容匹配
has
后跟一个或多个空白字符,后跟a
后跟一个或多个空白字符,后跟substring
。将其与替换运算符放在一起,您可以说:
输出为:
In regexes, you can use
+
to mean "one or more." So something like thismatches
has
followed by one or more whitespace chars, followed bya
followed by one or more whitespace chars, followed bysubstring
.Putting it together with a substitution operator, you can say:
And the output is:
许多人建议使用
\s+
来匹配空格。以下是自动执行此操作的方法:输出:
您可能想要转义字符串中的任何元字符。如果有人感兴趣,我可以添加它。
A many has suggested, use
\s+
to match whitespace. Here is how you do it automaticly:Output:
You might want to escape any meta-characters in the string. If someone is interested, I could add it.
这是一个如何做到这一点的示例。
目前,它会执行任何操作来检查
$match
变量中是否存在不安全字符。This is an example of how you could do that.
This currently does anything to check the
$match
variable for unsafe characters.