sed - 如何删除除定义模式之外的所有内容?
我必须从另一个中删除 % 之前的除 1、2 或 3 位数字(0-9、或 10-99 或 100)之外的所有内容(但我不想看到 %)命令的输出并将其转发到另一个命令。我知道
sed -n '/%/p'
只会显示包含 % 的行,但这不是我想要的。如何删除其余不需要的文本并仅保留这些数字,然后将它们通过管道传输到另一个命令?
I have to remove everything but 1, 2, or 3 digits (0-9, or 10-99, or 100) preceding % (I don't want to see the %, though) from another command's output and pipe it forward to another command. I know that
sed -n '/%/p'
will show only the line(s) containing %, but that's not what I want. How can I get rid of the rest of the unwanted text and leave only these numbers to then pipe them to another command?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您没有完全依赖 sed,这正是 grep -o 的作用:
If you're not completely tied to sed, this is exactly what
grep -o
does:编辑:我误解了OP并发布了无效的答案。我将其更改为我相信可以在更一般的情况下解决问题的答案。
对于如下文件:
使用 sed -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\ 2/p' input
-n
标志使 sed 禁止自动输出行。然后,我们使用-E
标志,它允许我们使用扩展的正则表达式。 (在 GNU sed 中,该标志不是-E
而是-r
)。现在是
s///
命令。组(^|.*[^0-9])
匹配行的开头 (^
) 或一系列零个或多个字符 (. *
)以非数字字符 ([^0-9]
) 结尾。[0-9]\{1,3\}
仅匹配一到三位数字并绑定到一个组(通过(
和)
组分隔符)如果组前面是(^|.*[^0-9])
后面是%
。然后.*
匹配该模式之前和之后的所有内容。之后,我们使用反向引用\2
将所有内容替换为第二组 (([0-9]{1,3})
)。由于我们将-n
传递给 sed,因此不会打印任何内容,但我们将p
标志传递给s///
命令。结果是,如果执行替换,则打印结果行。请注意,p
是s///
的标志,不是p
命令,因为它紧接在最后一个/
之后。EDIT: I have misunderstood the OP and posted an invalid answer. I changed it to an answer that, I believe, would solve the problem in the more general scenario.
For a file such as the one below:
Use
sed -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input
The
-n
flag makes sed to suppress automatic output of the lines. Then, we use the-E
flag which will allow us to use extended regular expressions. (In GNU sed, the flag is not-E
but instead is-r
).Now comes the
s///
command. The group(^|.*[^0-9])
matchs either a beginning of line (^
) or a series of zero or more chars (.*
) ending in a non-digit char ([^0-9]
).[0-9]\{1,3\}
just matches one to three digits and is bound to a group (by the(
and)
group delimiters) if the group is preceded by(^|.*[^0-9])
and followed by%
. Then.*
matches everything before and after this pattern. After this, we replace everything by the second group (([0-9]{1,3})
) using the backreference\2
. Since we passed-n
to sed, nothing would be printed but we passed thep
flag to thes///
command. The result is that if the replacement is executed then the resulted line is printed. Note thep
is a flag ofs///
, not thep
command, because it comes just after the last/
.sed -e 's/[^0-9]*\([0-9]*\)%.*/\1/'
捕获一组中的数字,因为模式匹配所有内容(前导和尾随.*
)全部被丢弃。(我的模式匹配任意数量的数字,因为
sed
正则表达式不支持您在 perlre 等中看到的方便的快捷方式,例如[0-9]{1,3}
所以我选择保持简单来说明您关心的原理)编辑:修复引用并将前导
.*
替换为[^0-9]*
以避免贪婪匹配消耗数字。使用 perlre 再次变得更加简单,您可以使用非贪婪的.?*
sed -e 's/[^0-9]*\([0-9]*\)%.*/\1/'
captures the digits in a group and because the pattern matches everything (the leading and trailing.*
) it all gets discarded.(my pattern matches any number of digits since
sed
regular expressions don't support handy shortcuts like[0-9]{1,3}
that you see in perlre and others so I elected to keep it simple to illustrate the principle you cared about)Edit: to fix quoting and replace leading
.*
with[^0-9]*
to avoid the greedy match consuming the numbers. Once again more straightforward with perlre where you can use a non-greedy.?*
这是我的想法:
如果该行是 1-3 位数字,后跟一个 %,则会删除 % 符号。否则,它将删除整行。因此,对于诸如
It 这样的输入,会产生
Here's my shot:
If the line is 1-3 digits followed by a %, it removes the %-sign. Otherwise, it removes the entire line. So, for input such as
It yields
使用
awk
而不是sed
。对于每个字段,检查末尾是否有
%
符号。如果是,请打印该号码。 ($i+0表示转换为整数)。使用最少的正则表达式。Use
awk
instead ofsed
.For each field, check to see if there is
%
sign at the end. If yes, print the number. ($i+0 means to convert to integer). Minimal Regular expression used.100% 将被提取,因为否则 987% 的种类数(或 123%,如果在第一个位置的 1 上过滤)也会发送到输出
the 100% is to be extracted because otherwise number of kind 987% (or 123% if filtered on 1 at 1st position) are also send to output