如何使用 sed 将方括号内的所有空格替换为下划线?

发布于 2024-10-08 09:10:45 字数 341 浏览 4 评论 0原文

我发现为了将 [some name] 转换为 [some_name] 我需要使用以下表达式:

s/\(\[[^ ]*\) /\1_/

即为以包含任意数量的非空格字符的文字“[”开头的任何内容创建反向引用捕获空格,替换为非空格字符,后跟下划线。但我还不知道如何更改此表达式,以便它适用于大括号内的所有下划线,例如将 [几个单词] 转换为 [a_few_words]。

我感觉我已经很接近了,但我只是缺少一大块知识,这些知识将解锁使这个东西在(SQL Server 的)行中包含的第一组 [] 的约束内无限次工作的关键在本例中为 DDL)。

任何建议都感激不尽......

I figured out that in order to turn [some name] into [some_name] I need to use the following expression:

s/\(\[[^ ]*\) /\1_/

i.e. create a backreference capture for anything that starts with a literal '[' that contains any number of non space characters, followed by a space, to be replaced with the non space characters followed by an underscore. What I don't know yet though is how to alter this expression so it works for ALL underscores within the braces e.g. [a few words] into [a_few_words].

I sense that I'm close, but am just missing a chunk of knowledge that will unlock the key to making this thing work an infinite number of times within the constraints of the first set of []s contained in a line (of SQL Server DDL in this case).

Any suggestions gratefully received....

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

眼中杀气 2024-10-15 09:10:45

所需的技巧有两个部分:

  1. 当到达右方括号时停止替换(但在线上重复执行):

    s/\(\[[^] ]*\) /\1_/g
    

    它匹配一个左方括号,后跟零个或多个既不是空格也不是右方括号的字符。全局后缀意味着该模式应用于以左方括号开头,最后在该行上跟随一个空白或右方括号的所有序列。另请注意,此正则表达式不会更改“[single-word] and context”,而原始正则表达式会将其转换为“[single-word]_and context”,这不是练习的目的。

  2. 让 sed 从这次开始的地方重复搜索。不幸的是,没有一个真正好的方法可以做到这一点。 Sed 总是在被替换的文本之后继续搜索;这是我们不希望出现的情况之一。有时,您只需重复替换操作就可以逃脱惩罚。在这种情况下,您必须在每次替换成功时重复它,并在不再有替换时停止。

sed 中两个不太为人所知的操作是“:label”和“t”命令。不过,它们出现在 Unix 第七版(大约 1978 年)中,因此它们并不是新功能。第一个只是标识脚本中的一个位置,可以使用“b”(此处不需要)或“t”跳转到该位置:

<前><代码>[2addr]t [标签]

如果自最近读取输入行或执行“t”函数以来进行了任何替换,则分支到带有标签的“:”函数。如果未指定标签,则分支到脚本末尾。

太棒了:我们需要:

 sed -e ':redo; s/\(\[[^] ]*\) /\1_/g; t redo' data.file

例外 - 它不能在这样的一行上全部工作(至少在 MacOS X 上不行)。不过,这确实有效:

sed -e ':redo
        s/\(\[[^] ]*\) /\1_/g
        t redo' data.file

或者,正如评论中所述,您可以编写三个单独的“-e”选项(适用于 MacOS X):

 sed -e ':redo' -e 's/\(\[[^] ]*\) /\1_/g' -e 't redo' data.file

给定数据文件:

a line with [one blank] word inside square brackets.
a line with [two blank] or [three blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple words in a single bracket] inside square brackets.
a line with [multiple words in a single bracket] [several times on one line]

显示的 sed 脚本的输出是:

a line with [one_blank] word inside square brackets.
a line with [two_blank] or [three_blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple_words_in_a_single_bracket] inside square brackets.
a line with [multiple_words_in_a_single_bracket] [several_times_on_one_line]

最后,阅读问题中的细则,如果您只需要在每行的第一个方括号字段中完成此操作,那么我们需要确保在开始匹配的方括号之前没有左方括号。这个变体有效:(

sed -e ':redo' -e 's/^\([^]]*\[[^] ]*\) /\1_/' -e 't redo' data.file

“g”限定符消失了——在给定循环的其他变体中可能不需要它;它的存在可能会使过程稍微更有效,但很可能基本上不可能检测到这一点。该模式现在锚定到行的开头(插入符号),并包含零个或多个在第一个左方括号之前不是左方括号的字符。)

示例输出:

a line with [two_blank] or [three blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple_words_in_a_single_bracket] inside square brackets.
a line with [multiple_words_in_a_single_bracket] [several times on one line]

There are two parts to the trickery needed:

  1. Stop replacing when you reach a close square bracket (but do it repeatedly on the line):

    s/\(\[[^] ]*\) /\1_/g
    

    This matches an open square bracket, followed by zero or more characters that are neither a blank nor a close square bracket. The global suffix means that the pattern is applied to all sequences starting with an open square bracket followed eventually by a blank or close square bracket on the line. Note, too, that this regex does not alter '[single-word] and context' whereas the original would translate that to '[single-word]_and context', which is not the object of the exercise.

  2. Get sed to repeat the search from where this one started. Unfortunately, there isn't a truly good way to do that. Sed always resumes searching after the text that was substituted; and this is one occasion when we don't want that. Sometimes, you can get away with simply repeating the substitute operation. In this case, you have to repeat it every time the substitution succeeds, stopping when there are no more substitutions.

Two of the less well known operations in sed are the ':label' and the 't' commands. They were present in the 7th Edition of Unix (circa 1978), though, so they are not new features. The first simply identifies a position in the script which can be jumped to with 'b' (not wanted here) or 't':

[2addr]t [label]

Branch to the ':' function bearing the label if any substitutions have been made since the most recent reading of an input line or execution of a 't' function. If no label is specified, branch to the end of the script.

Marvellous: we need:

 sed -e ':redo; s/\(\[[^] ]*\) /\1_/g; t redo' data.file

Except - it doesn't work all on one line like that (at least, not on MacOS X). This did work admirably, though:

sed -e ':redo
        s/\(\[[^] ]*\) /\1_/g
        t redo' data.file

Or, as noted in the comments, you could write three separate '-e' options (which works on MacOS X):

 sed -e ':redo' -e 's/\(\[[^] ]*\) /\1_/g' -e 't redo' data.file

Given the data file:

a line with [one blank] word inside square brackets.
a line with [two blank] or [three blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple words in a single bracket] inside square brackets.
a line with [multiple words in a single bracket] [several times on one line]

the output from the sed script shown is:

a line with [one_blank] word inside square brackets.
a line with [two_blank] or [three_blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple_words_in_a_single_bracket] inside square brackets.
a line with [multiple_words_in_a_single_bracket] [several_times_on_one_line]

And, finally, reading the fine print in the question, if you need this done only in the first square-bracketed field on each line, then we need to ensure that are no open square brackets before the one that starts the match. This variant works:

sed -e ':redo' -e 's/^\([^]]*\[[^] ]*\) /\1_/' -e 't redo' data.file

(The 'g' qualifier is gone - it probably isn't needed in the other variants either given the loop; its presence might make the process marginally more efficient, but it would most likely be essentially impossible to detect that. The pattern is now anchored to the start of the line (the caret) and contains zero or more characters that are not open square bracket before the first open square bracket.)

Sample output:

a line with [two_blank] or [three blank] words inside square brackets.
a line with [no-blank] word inside square brackets.
a line with [multiple_words_in_a_single_bracket] inside square brackets.
a line with [multiple_words_in_a_single_bracket] [several times on one line]
失去的东西太少 2024-10-15 09:10:45

在像 perl 这样具有“可执行”替换的语言中,这更容易:

perl -wne 's/(\[.*?])/ do { my $x = $1; $x =~ y, ,_,; $x } /ge; print'

或者更清楚地分割它:

sub replace_with_underscores {
    my $s = shift;
    $s =~ y/ /_/;
    $s
}
s/(\[.*?])/ replace_with_underscores($1) /ge;

.*? 是非贪婪匹配(以避免将两个相邻的括号短语混淆在一起)并且替换的 e 标志会导致对其进行求值,因此您可以调用函数来完成内部工作。

This is easier in a language like perl which has "executable" substitutions:

perl -wne 's/(\[.*?])/ do { my $x = $1; $x =~ y, ,_,; $x } /ge; print'

Or to split it up more clearly:

sub replace_with_underscores {
    my $s = shift;
    $s =~ y/ /_/;
    $s
}
s/(\[.*?])/ replace_with_underscores($1) /ge;

The .*? is the non-greedy match (to avoid slurring together two adjacent bracketed phrases) and the e flag to the substitution causes it to be evaluated, so you can call a function to do the inner work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文