是否使用 GNU sed 在正则表达式中转义 ( 和 )
我注意到这个网站上有几篇文章说,使用 gnu sed 您应该在正则表达式中使用 (
和 )
而不是 \(
和 <代码>\)。但后来我查看了 gnu sed 手册并看到他们指定必须使用 \(
和 \)
。这是怎么回事?
I've noticed several posts on this site which say that with gnu sed you should use (
and )
in regex rather than \(
and \)
. But then I looked in the gnu sed manual and saw that they specify that \(
and \)
must be used. What's up?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您链接到的 gnu sed 手册的这一部分解释了是否应该转义括号取决于您使用的是基本正则表达式还是扩展正则表达式。 这部分表示
-r< /code> 标志决定您所处的模式。
编辑: 正如 grok12 的评论中所述,bsd sed 中的
-E
标志的作用与-r 标志在gnu sed。
This part of the gnu sed manual you linked to explains that whether you should escape parentheses depends on whether you are using basic regular expressions or extended regular expressions. This part says that the
-r
flag determines what mode you are in.Edit: as stated in grok12's comment, the
-E
flag in bsd sed does what the-r
flag does in gnu sed.最初 sed 与 grep 和其他所有东西一样,使用
\(
来指示分组,而(
只是匹配文字 open-paren。正则表达式的许多较新的实现,包括 egrep 和perl,改变了这一点,所以
\(
意味着文字开放括号,并且(
用于指定分组。所以现在使用 GNU sed,
( 是一个特殊字符;就像egrep一样,但据我所知,在其他系统(例如BSD)上,这仍然是旧方法,不幸的是,这真是一团糟,因为现在很难知道该使用哪一个。
Originally sed, like grep and everything else, used
\(
to indicate grouping, whereas(
just matched a literal open-paren.Many newer implementations of regular expressions, including egrep and perl, switched this around, so
\(
meant a literal open-paren, and(
was used to specify grouping.So now with GNU sed,
(
is a special character; just like egrep. But on other systems (e.g. BSD) it's still the old way, as far as I can tell. Unfortunately this is a real mess, because now it's hard to know which one to use.感谢rocker、murga 和chris。你们每个人都帮助我理解了这个问题。我在这里回答我自己的问题是为了(希望)将整个故事放在一起。
目前使用的 sed 有两个主要版本:gnu 和 bsd。它们都需要基本正则表达式中的括号在用于分组时进行转义,但在扩展正则表达式中使用时不进行转义。它们的区别在于 -r 选项为 gnu 启用扩展正则表达式,但 -E 为 bsd 启用扩展正则表达式。
Mac OSX 中的标准 sed 是 bsd。我相信世界上其他许多地方都使用 gnu sed 作为标准,但我不确切知道谁使用什么。如果你不确定你正在使用哪个,请尝试:
如果你得到
回复,那么你已经拥有了 bsd。
Thanks to rocker, murga, and chris. Each of you helped me understand the issue. I'm answering my own question here in order to (hopefully) put the whole story together in one place.
There are two major versions of sed in use: gnu and bsd. Both of them require parens in basic regex to be escaped when used for grouping but not escaped when used in extended regex. They diff in that the -r option enables extended regex for gnu but -E does so for bsd.
The standard sed in mac OSX is bsd. I believe much of the rest of the world uses gnu sed as the standard but I don't know precisely who uses what. If you are unsure which you are using try:
If you get a
reply then you have bsd.
转义括号 (
\(
) 使正则表达式搜索括号作为表达式的一部分。未转义括号 (
(
) 使正则表达式将括号的内容分组在一起。在其他情况下换句话说,如果您转义它们,引擎会查找它们,但如果您按原样保留它们,它们会导致引擎将结果分组到变量中。
演示示例:
$myString =。 "junk(150)moar";
要仅获取数字:
#^\w+\((\d+)\)\w+$#
(
$1
是150
)我知道这很混乱,但是它演示了如何使用分组括号以及将括号作为匹配表达式的一部分。
多年后更新:
作为用户 @bmk 正确指出< /a>,这个答案适用于扩展正则表达式,但不适用于基本正则表达式。在大多数编程语言等中,很难找到基本的正则表达式作为默认解析引擎,但在假设此答案适用于您的情况之前,请谨慎验证您正在使用哪个引擎。
Escaped parentheses (
\(
) make the regex search for parentheses as part of the expression.Unescaped parentheses (
(
) make the regex group the contents of the parentheses together.In other words, if you escape them, the engine looks for them, but if you leave them as is, they cause the engine to group results into variables.
An example to demonstrate:
$myString = "junk(150)moar";
To get just the number:
#^\w+\((\d+)\)\w+$#
(
$1
is150
)It's a mess, I know, but it demonstrates the use of grouping parentheses and parentheses as part of the matching expression.
Update Years Later:
As user @bmk correctly points out, this answer applies to extended regular expressions, but not to basic regular expressions. It's difficult to find basic regular expressions as the default parsing engine in most programming languages, etc., but it would be prudent to verify which engine you are using before assuming this answer will apply to your situation.