如何使用 sed 将方括号内的所有空格替换为下划线?
我发现为了将 [some name] 转换为 [some_name] 我需要使用以下表达式:
s/\(\[[^ ]*\) /\1_/
即为以包含任意数量的非空格字符的文字“[”开头的任何内容创建反向引用捕获空格,替换为非空格字符,后跟下划线。但我还不知道如何更改此表达式,以便它适用于大括号内的所有下划线,例如将 [几个单词] 转换为 [a_few_words]。
我感觉我已经很接近了,但我只是缺少一大块知识,这些知识将解锁使这个东西在(SQL Server 的)行中包含的第一组 [] 的约束内无限次工作的关键在本例中为 DDL)。
任何建议都感激不尽......
I figured out that in order to turn [some name] into [some_name] I need to use the following expression:
s/\(\[[^ ]*\) /\1_/
i.e. create a backreference capture for anything that starts with a literal '[' that contains any number of non space characters, followed by a space, to be replaced with the non space characters followed by an underscore. What I don't know yet though is how to alter this expression so it works for ALL underscores within the braces e.g. [a few words] into [a_few_words].
I sense that I'm close, but am just missing a chunk of knowledge that will unlock the key to making this thing work an infinite number of times within the constraints of the first set of []s contained in a line (of SQL Server DDL in this case).
Any suggestions gratefully received....
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
所需的技巧有两个部分:
当到达右方括号时停止替换(但在线上重复执行):
它匹配一个左方括号,后跟零个或多个既不是空格也不是右方括号的字符。全局后缀意味着该模式应用于以左方括号开头,最后在该行上跟随一个空白或右方括号的所有序列。另请注意,此正则表达式不会更改“
[single-word] and context
”,而原始正则表达式会将其转换为“[single-word]_and context
”,这不是练习的目的。让 sed 从这次开始的地方重复搜索。不幸的是,没有一个真正好的方法可以做到这一点。 Sed 总是在被替换的文本之后继续搜索;这是我们不希望出现的情况之一。有时,您只需重复替换操作就可以逃脱惩罚。在这种情况下,您必须在每次替换成功时重复它,并在不再有替换时停止。
sed
中两个不太为人所知的操作是“:label
”和“t
”命令。不过,它们出现在 Unix 第七版(大约 1978 年)中,因此它们并不是新功能。第一个只是标识脚本中的一个位置,可以使用“b
”(此处不需要)或“t
”跳转到该位置:太棒了:我们需要:
例外 - 它不能在这样的一行上全部工作(至少在 MacOS X 上不行)。不过,这确实有效:
或者,正如评论中所述,您可以编写三个单独的“-e”选项(适用于 MacOS X):
给定数据文件:
显示的 sed 脚本的输出是:
最后,阅读问题中的细则,如果您只需要在每行的第一个方括号字段中完成此操作,那么我们需要确保在开始匹配的方括号之前没有左方括号。这个变体有效:(
“g”限定符消失了——在给定循环的其他变体中可能不需要它;它的存在可能会使过程稍微更有效,但很可能基本上不可能检测到这一点。该模式现在锚定到行的开头(插入符号),并包含零个或多个在第一个左方括号之前不是左方括号的字符。)
示例输出:
There are two parts to the trickery needed:
Stop replacing when you reach a close square bracket (but do it repeatedly on the line):
This matches an open square bracket, followed by zero or more characters that are neither a blank nor a close square bracket. The global suffix means that the pattern is applied to all sequences starting with an open square bracket followed eventually by a blank or close square bracket on the line. Note, too, that this regex does not alter '
[single-word] and context
' whereas the original would translate that to '[single-word]_and context
', which is not the object of the exercise.Get sed to repeat the search from where this one started. Unfortunately, there isn't a truly good way to do that. Sed always resumes searching after the text that was substituted; and this is one occasion when we don't want that. Sometimes, you can get away with simply repeating the substitute operation. In this case, you have to repeat it every time the substitution succeeds, stopping when there are no more substitutions.
Two of the less well known operations in
sed
are the ':label
' and the 't
' commands. They were present in the 7th Edition of Unix (circa 1978), though, so they are not new features. The first simply identifies a position in the script which can be jumped to with 'b
' (not wanted here) or 't
':Marvellous: we need:
Except - it doesn't work all on one line like that (at least, not on MacOS X). This did work admirably, though:
Or, as noted in the comments, you could write three separate '-e' options (which works on MacOS X):
Given the data file:
the output from the sed script shown is:
And, finally, reading the fine print in the question, if you need this done only in the first square-bracketed field on each line, then we need to ensure that are no open square brackets before the one that starts the match. This variant works:
(The 'g' qualifier is gone - it probably isn't needed in the other variants either given the loop; its presence might make the process marginally more efficient, but it would most likely be essentially impossible to detect that. The pattern is now anchored to the start of the line (the caret) and contains zero or more characters that are not open square bracket before the first open square bracket.)
Sample output:
在像 perl 这样具有“可执行”替换的语言中,这更容易:
或者更清楚地分割它:
.*?
是非贪婪匹配(以避免将两个相邻的括号短语混淆在一起)并且替换的e
标志会导致对其进行求值,因此您可以调用函数来完成内部工作。This is easier in a language like perl which has "executable" substitutions:
Or to split it up more clearly:
The
.*?
is the non-greedy match (to avoid slurring together two adjacent bracketed phrases) and thee
flag to the substitution causes it to be evaluated, so you can call a function to do the inner work.