当区域重叠时,Sed 不会替换文件中的所有实例

发布于 2024-12-24 21:43:01 字数 867 浏览 0 评论 0原文

我需要用其他单词替换几个单词。

例如:file中的“apple”和“FRUIT”,仅在这4种情况:

  • _apple_前后有空格。
  • [apple_,前面有一个方括号,后面有一个空格。
  • _apple],前面有一个空格,后面有一个方括号。
  • [apple],前后都有方括号。

我不希望在任何其他情况下发生替换。

我尝试使用以下代码:

a="apple"
b="fruit"
sed -i "s/ $a / $b /g" ./file
sed -i "s/\[$a /\[$b /g" ./file
sed -i "s/ $a\]/ $b\]/g" ./file
sed -i "s/\[$a\]/\[$b\]/g" ./file

我认为末尾的选项“g”意味着它将替换所有实例,但我发现这不是一个彻底的解决方案。例如,如果 file 包含以下内容:

apple spider apple apple spider tree apple tree

第三次出现的“apple”不会被替换。同样在此,该词的几次出现都没有改变:

apple  spider apple apple apple apple apple spider tree apple tree

我怀疑这是因为共享“空间”。

我怎样才能找到并用 $b 替换 $a 的所有实例,无论是否有重叠?

I need to replace several words with other words.

For e.g.: "apple" with "FRUIT" in file, only in these 4 situations:

  • _apple_, has a blank space before and after.
  • [apple_, has a square opening bracket before and a blank space after.
  • _apple], has a blank space before and a square closing bracket after.
  • [apple], has square brackets before and after.

I do not want the replaces to occur in any other situation.

I have tried using the following code:

a="apple"
b="fruit"
sed -i "s/ $a / $b /g" ./file
sed -i "s/\[$a /\[$b /g" ./file
sed -i "s/ $a\]/ $b\]/g" ./file
sed -i "s/\[$a\]/\[$b\]/g" ./file

I thought the option "g" at the end would mean it would replace all instances, but I found this is not a thorough solution. For e.g. if file contains this:

apple spider apple apple spider tree apple tree

The third occurrence of "apple" is not being replaced. Also in this, several appearances of the word are not changed:

apple  spider apple apple apple apple apple spider tree apple tree

I suspect this is because the shared "space".

How can I get this to find and replace all instances of $a with $b, regardless of any overlap?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

烙印 2024-12-31 21:43:01

快速而肮脏的解决方案是执行两次更换。

$ echo apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g; s/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT FRUIT apple[FRUIT FRUIT]

这是安全的,因为在第一个命令之后,生成的文本将不会包含原始文本中未出现的任何 (\[| )apple( |\])

缺点是两次替换大约需要两倍的时间来运行。

如果您在两次执行sed中中断它,您可以更清楚地看到步骤:

$ echo apple apple apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT apple FRUIT apple apple[FRUIT apple]

$ echo apple FRUIT apple FRUIT apple apple[FRUIT apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT FRUIT FRUIT FRUIT apple[FRUIT FRUIT]

The quick-and-dirty solution is to perform the replacement twice.

$ echo apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g; s/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT FRUIT apple[FRUIT FRUIT]

This is safe because, after the first command, the resulting text won't contain any occurrences of (\[| )apple( |\]) that were not already in the original text.

The drawback is that two replacements take roughly twice more time to run.

If you break it in two executions of sed, you can see the steps clearer:

$ echo apple apple apple apple apple apple[apple apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT apple FRUIT apple apple[FRUIT apple]

$ echo apple FRUIT apple FRUIT apple apple[FRUIT apple] | sed -e 's/\(\[\| \)apple\( \|\]\)/\1FRUIT\2/g'
apple FRUIT FRUIT FRUIT FRUIT apple[FRUIT FRUIT]
独留℉清风醉 2024-12-31 21:43:01

您可以使用反向引用来做到这一点。这应该完全兼容 POSIX

sed -i 's/^badger\([] ]\)/SNAKE\1/g; \
        s/\([[ ]\)badger$/\1SNAKE/g; \
        s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g; \
        s/ badger]/ SNAKE]/g' ./infile

示例

$ sed 's/^badger\([] ]\)/SNAKE\1/g;s/\([[ ]\)badger$/\1SNAKE/g;s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g;s/ badger]/ SNAKE]/g' <<<"badger [badger badger] [badger] badger foobadger badgering mushroom badger"
SNAKE [SNAKE SNAKE] [SNAKE] SNAKE foobadger badgering mushroom SNAKE

You can do this using backreferences. This should be fully POSIX compatible

sed -i 's/^badger\([] ]\)/SNAKE\1/g; \
        s/\([[ ]\)badger$/\1SNAKE/g; \
        s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g; \
        s/ badger]/ SNAKE]/g' ./infile

Example

$ sed 's/^badger\([] ]\)/SNAKE\1/g;s/\([[ ]\)badger$/\1SNAKE/g;s/\([[ ]\)badger\([] ]\)/\1SNAKE\2/g;s/ badger]/ SNAKE]/g' <<<"badger [badger badger] [badger] badger foobadger badgering mushroom badger"
SNAKE [SNAKE SNAKE] [SNAKE] SNAKE foobadger badgering mushroom SNAKE
千里故人稀 2024-12-31 21:43:01
sed -i "s/\bapple\b/FRUIT/g" file

\b 匹配单词边界。可能不完全便携,至少不能在 Mac OS X 上运行。

还有一个更有趣的测试:

$ cat file; sed "s/\bapple\b/FRUIT/g" file
apple apple apple spider tree apple tree applejuice pineapple apple.com etc
FRUIT FRUIT FRUIT spider tree FRUIT tree applejuice pineapple FRUIT.com etc
sed -i "s/\bapple\b/FRUIT/g" file

\b matches word boundaries. Probably not entirely portable, doesn't work on Mac OS X at least.

And a more interesting test:

$ cat file; sed "s/\bapple\b/FRUIT/g" file
apple apple apple spider tree apple tree applejuice pineapple apple.com etc
FRUIT FRUIT FRUIT spider tree FRUIT tree applejuice pineapple FRUIT.com etc
披肩女神 2024-12-31 21:43:01

考虑使用向前看和向后看:

s/(?<=[\s\[])apple(?=[\s\]])/FRUIT/g

演示: http://regexr.com?2vl8p


好的,我测试了 <现在在我的计算机中运行 code>regex 并注意到向前查找和向后查找在标准 sed 中不起作用,您可以将 ssed--regexp-perl 选项改为:

uname -msrv
Darwin 11.2.0 Darwin Kernel Version 11.2.0: Tue Aug  9 20:54:00 PDT 2011; root:xnu-1699.24.8~1/RELEASE_X86_64 x86_64
ssed --ver
super-sed version 3.62
based on GNU sed version 4.1

Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
ssed -R 's/(?<=[\s\[])apple(?=[\s\]])/FRUIT/g'
apple spider apple apple spider tree apple tree
apple spider FRUIT FRUIT spider tree FRUIT tree

Consider using look ahead and look behinds:

s/(?<=[\s\[])apple(?=[\s\]])/FRUIT/g

Demo: http://regexr.com?2vl8p


Okay, I tested the regex in my computer now and noted that look aheads and look behinds doesn't work in standard sed, you would use ssed with --regexp-perl option instead:

uname -msrv
Darwin 11.2.0 Darwin Kernel Version 11.2.0: Tue Aug  9 20:54:00 PDT 2011; root:xnu-1699.24.8~1/RELEASE_X86_64 x86_64
ssed --ver
super-sed version 3.62
based on GNU sed version 4.1

Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
ssed -R 's/(?<=[\s\[])apple(?=[\s\]])/FRUIT/g'
apple spider apple apple spider tree apple tree
apple spider FRUIT FRUIT spider tree FRUIT tree
你没皮卡萌 2024-12-31 21:43:01

使用 sed 的一种方法:

sed "s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g; s/\( \|\[\)$a\( \|\]\)/\1$b\2/g; s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g" file

共有三个替换命令。说明:

s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g      # Duplicate each space character surrounded with non-space 
                                          # characters.
s/\( \|\[\)$a\( \|\]\)/\1$b\2/g           # Substitute content of variable '$a' when just before there is a 
                                          # blank or '[' and just after another space or ']'. Any combination
                                          # of those. And replace with content of variable '$b' and same
                                          # groups of the pattern (\1 and \2).
s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g    # Remove a space when found two consecutive surrounded with 
                                          # non-space characters.

我的测试:

文件的内容

apple spider apple apple spider tree apple tree
apple spider [apple apple spider tree apple] tree
apple spider apple apple spider tree appletree
apple spider apple apple spider tree [apple] tree
apple  spider apple apple apple apple apple spider tree apple tree

设置变量:

a="apple"
b="fruit"

运行sed命令:

sed "s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g; s/\( \|\[\)$a\( \|\]\)/\1$b\2/g; s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g" file

结果:

apple spider fruit fruit spider tree fruit tree
apple spider [fruit fruit spider tree fruit] tree
apple spider fruit fruit spider tree appletree
apple spider fruit fruit spider tree [fruit] tree
apple spider fruit fruit fruit fruit fruit spider tree fruit tree

如果你的真实文件有不同的内容,它将不起作用分配空格或具有奇怪的格式。在这种情况下,sed 是一个有限的工具,最好是 perl 或具有前向和后向功能的类似工具。

One way using sed:

sed "s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g; s/\( \|\[\)$a\( \|\]\)/\1$b\2/g; s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g" file

There are three substitution commands. Explanation:

s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g      # Duplicate each space character surrounded with non-space 
                                          # characters.
s/\( \|\[\)$a\( \|\]\)/\1$b\2/g           # Substitute content of variable '$a' when just before there is a 
                                          # blank or '[' and just after another space or ']'. Any combination
                                          # of those. And replace with content of variable '$b' and same
                                          # groups of the pattern (\1 and \2).
s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g    # Remove a space when found two consecutive surrounded with 
                                          # non-space characters.

My test:

Content of file:

apple spider apple apple spider tree apple tree
apple spider [apple apple spider tree apple] tree
apple spider apple apple spider tree appletree
apple spider apple apple spider tree [apple] tree
apple  spider apple apple apple apple apple spider tree apple tree

Set variables:

a="apple"
b="fruit"

Run sed command:

sed "s/\([^ ]\)\([ ]\)\([^ ]\)/\1\2\2\3/g; s/\( \|\[\)$a\( \|\]\)/\1$b\2/g; s/\([^ ]\)\([ ]\{2\}\)\([^ ]\)/\1 \3/g" file

Result:

apple spider fruit fruit spider tree fruit tree
apple spider [fruit fruit spider tree fruit] tree
apple spider fruit fruit spider tree appletree
apple spider fruit fruit spider tree [fruit] tree
apple spider fruit fruit fruit fruit fruit spider tree fruit tree

It won't work if your real file has different distribution of spaces or has a strange format. In that case, sed is a limited tool, it would be better perl or similar with look-aheads and look-behinds.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文