如何进行“切割”?命令将相同的连续分隔符视为一个?
我试图从基于列的、“空间”调整的文本流中提取某个(第四个)字段。我尝试按以下方式使用 cut
命令:
cat text.txt | cut -d " " -f 4
不幸的是,cut
并不将多个空格视为一个分隔符。我可以通过 awk
awk '{ printf $4; 进行管道传输}'
或 sed
sed -E "s/[[:space:]]+/ /g"
折叠空格,但我想知道是否有办法本地处理 cut
和几个分隔符?
I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut
command in the following manner:
cat text.txt | cut -d " " -f 4
Unfortunately, cut
doesn't treat several spaces as one delimiter. I could have piped through awk
awk '{ printf $4; }'
or sed
sed -E "s/[[:space:]]+/ /g"
to collapse the spaces, but I'd like to know if there any way to deal with cut
and several delimiters natively?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
尝试:
从
tr
手册页:Try:
From the
tr
man page:当您在问题中评论时,
awk
确实是正确的选择。可以将cut
与tr -s
一起使用来压缩空格,如 kev 的答案显示。不过,让我为未来的读者介绍一下所有可能的组合。说明位于测试部分。
tr | cut
awk
bash
sed
测试
给定这个文件,让我们测试命令:
tr | cut
awk
bash
这将按顺序读取字段。通过使用
_
我们表明这是一个一次性变量,作为“垃圾变量”来忽略这些字段。这样,我们将$myfield
存储为文件中的第四个字段,无论它们之间有空格。sed
这使用
([^ ]*[ ]*){3}
捕获三组空格和无空格。然后,它捕获任何到来的内容,直到第四个字段出现空格为止,最后打印为\1
。As you comment in your question,
awk
is really the way to go. To usecut
is possible together withtr -s
to squeeze spaces, as kev's answer shows.Let me however go through all the possible combinations for future readers. Explanations are at the Test section.
tr | cut
awk
bash
sed
Tests
Given this file, let's test the commands:
tr | cut
awk
bash
This reads the fields sequentially. By using
_
we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store$myfield
as the 4th field in the file, no matter the spaces in between them.sed
This catches three groups of spaces and no spaces with
([^ ]*[ ]*){3}
. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with\1
.最短/最友好的解决方案
在对
cut
的太多限制感到沮丧之后,我编写了自己的替代方案,我将其称为削减
表示“类固醇削减”。cuts 提供了可能是最简单的解决方案,许多其他相关的剪切/粘贴问题。
解决这个特定问题的一个例子是:
cuts
支持:paste
)等等。标准
cut
没有提供这些。另请参阅:https://stackoverflow.com/a/24543231/1296044
来源和文档(免费软件):http://arielf.github.io/cuts/
shortest/friendliest solution
After becoming frustrated with the too many limitations of
cut
, I wrote my own replacement, which I calledcuts
for "cut on steroids".cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems.
One example, out of many, addressing this particular question:
cuts
supports:paste
separately)and much more. None of which is provided by standard
cut
.See also: https://stackoverflow.com/a/24543231/1296044
Source and documentation (free software): http://arielf.github.io/cuts/
此 Perl 一行代码显示了 Perl 与 awk 的密切关系:
但是,
@F
自动分割数组从索引$F[0]
开始,而 awk 字段以开始>$1
This Perl one-liner shows how closely Perl is related to awk:
However, the
@F
autosplit array starts at index$F[0]
while awk fields start with$1
据我所知,对于
cut
版本,这是不可能的。cut
主要用于解析分隔符不是空格的文件(例如/etc/passwd
)并且具有固定数量的字段。连续两个分隔符意味着一个空字段,这也适用于空格。With versions of
cut
I know of, no, this is not possible.cut
is primarily useful for parsing files where the separator is not whitespace (for example/etc/passwd
) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.我已经实现了 一个补丁,添加了新的 < code>-m
cut(1)
的命令行选项,它在字段模式下工作,并将多个连续分隔符视为单个分隔符。这基本上以一种相当有效的方式解决了OP的问题,通过将多个空格视为cut(1)
中的一个分隔符。特别是,应用我的补丁后,以下命令将执行所需的操作。就这么简单,只需在命令行中添加
-m
即可:我也在上游提交了这个补丁,希望它最终能被接受并合并到 coreutils 项目中。
关于添加更多空格,还有一些进一步的想法与
cut(1)
相关的功能,并且从不同的人那里获得一些反馈会很棒,最好是在 coreutils 邮件列表。我愿意为cut(1)
实现更多补丁并将其提交到上游,这将使该实用程序在各种实际场景中更加通用且更可用。I've implemented a patch that adds new
-m
command-line option tocut(1)
, which works in the field mode and treats multiple consecutive delimiters as a single delimiter. This basically solves the OP's question in a rather efficient way, by treating several spaces as one delimiter right withincut(1)
.In particular, with my patch applied, the following command will perform the desired operation. It's as simple as that, just add
-m
into the command line:I also submitted this patch upstream, and let's hope that it will eventually be accepted and merged into the coreutils project.
There are some further thoughts about adding even more whitespace-related features to
cut(1)
, and having some feedback on all that from different people would be great, preferably on the coreutils mailing list. I'm willing to implement more patches forcut(1)
and submit them upstream, which would make this utility more versatile and more usable in various real-world scenarios.