如何进行“切割”?命令将相同的连续分隔符视为一个?

发布于 2024-10-01 16:58:41 字数 344 浏览 6 评论 0原文

我试图从基于列的、“空间”调整的文本流中提取某个(第四个)字段。我尝试按以下方式使用 cut 命令:

cat text.txt | cut -d " " -f 4

不幸的是,cut 并不将多个空格视为一个分隔符。我可以通过 awk

awk '{ printf $4; 进行管道传输}'

或 sed

sed -E "s/[[:space:]]+/ /g"

折叠空格,但我想知道是否有办法本地处理 cut 和几个分隔符?

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut command in the following manner:

cat text.txt | cut -d " " -f 4

Unfortunately, cut doesn't treat several spaces as one delimiter. I could have piped through awk

awk '{ printf $4; }'

or sed

sed -E "s/[[:space:]]+/ /g"

to collapse the spaces, but I'd like to know if there any way to deal with cut and several delimiters natively?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

岁月静好 2024-10-08 16:58:41

尝试:

tr -s ' ' <text.txt | cut -d ' ' -f4

tr 手册页:

-s, --squeeze-repeats   replace each input sequence of a repeated character
                        that is listed in SET1 with a single occurrence
                        of that character

Try:

tr -s ' ' <text.txt | cut -d ' ' -f4

From the tr man page:

-s, --squeeze-repeats   replace each input sequence of a repeated character
                        that is listed in SET1 with a single occurrence
                        of that character
天暗了我发光 2024-10-08 16:58:41

当您在问题中评论时,awk确实是正确的选择。可以将 cuttr -s 一起使用来压缩空格,如 kev 的答案显示。

不过,让我为未来的读者介绍一下所有可能的组合。说明位于测试部分。

tr | cut

tr -s ' ' < file | cut -d' ' -f4

awk

awk '{print $4}' file

bash

while read -r _ _ _ myfield _
do
   echo "forth field: $myfield"
done < file

sed

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file

测试

给定这个文件,让我们测试命令:

$ cat a
this   is    line     1 more text
this      is line    2     more text
this    is line 3     more text
this is   line 4            more    text

tr | cut

$ cut -d' ' -f4 a
is
                        # it does not show what we want!


$ tr -s ' ' < a | cut -d' ' -f4
1
2                       # this makes it!
3
4
$

awk

$ awk '{print $4}' a
1
2
3
4

bash

这将按顺序读取字段。通过使用 _ 我们表明这是一个一次性变量,作为“垃圾变量”来忽略这些字段。这样,我们将 $myfield 存储为文件中的第四个字段,无论它们之间有空格。

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4

sed

这使用 ([^ ]*[ ]*){3} 捕获三组空格和无空格。然后,它捕获任何到来的内容,直到第四个字段出现空格为止,最后打印为 \1

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4

As you comment in your question, awk is really the way to go. To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows.

Let me however go through all the possible combinations for future readers. Explanations are at the Test section.

tr | cut

tr -s ' ' < file | cut -d' ' -f4

awk

awk '{print $4}' file

bash

while read -r _ _ _ myfield _
do
   echo "forth field: $myfield"
done < file

sed

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file

Tests

Given this file, let's test the commands:

$ cat a
this   is    line     1 more text
this      is line    2     more text
this    is line 3     more text
this is   line 4            more    text

tr | cut

$ cut -d' ' -f4 a
is
                        # it does not show what we want!


$ tr -s ' ' < a | cut -d' ' -f4
1
2                       # this makes it!
3
4
$

awk

$ awk '{print $4}' a
1
2
3
4

bash

This reads the fields sequentially. By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield as the 4th field in the file, no matter the spaces in between them.

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4

sed

This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4
救赎№ 2024-10-08 16:58:41

最短/最友好的解决方案

在对 cut 的太多限制感到沮丧之后,我编写了自己的替代方案,我将其称为 削减表示“类固醇削减”。

cuts 提供了可能是最简单的解决方案,许多其他相关的剪切/粘贴问题。

解决这个特定问题的一个例子是:

$ cat text.txt
0   1        2 3
0 1          2   3 4

$ cuts 2 text.txt
2
2

cuts 支持:

  • 自动检测文件中最常见的字段分隔符(+ 覆盖默认值的能力)
  • 多字符、混合字符和正则表达式匹配的分隔符
  • 从多个文件中提取列,使用混合分隔符
  • 从行尾偏移(使用负数)除了行首
  • 自动并排粘贴列(无需单独调用 paste
  • 之外,还支持对
  • 配置文件进行字段重新排序,用户可以在其中更改他们的个人
  • 偏好强调用户友好性和极简主义需要打字

等等。标准cut 没有提供这些。

另请参阅:https://stackoverflow.com/a/24543231/1296044

来源和文档(免费软件):http://arielf.github.io/cuts/

shortest/friendliest solution

After becoming frustrated with the too many limitations of cut, I wrote my own replacement, which I called cuts for "cut on steroids".

cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems.

One example, out of many, addressing this particular question:

$ cat text.txt
0   1        2 3
0 1          2   3 4

$ cuts 2 text.txt
2
2

cuts supports:

  • auto-detection of most common field-delimiters in files (+ ability to override defaults)
  • multi-char, mixed-char, and regex matched delimiters
  • extracting columns from multiple files with mixed delimiters
  • offsets from end of line (using negative numbers) in addition to start of line
  • automatic side-by-side pasting of columns (no need to invoke paste separately)
  • support for field reordering
  • a config file where users can change their personal preferences
  • great emphasis on user friendliness & minimalist required typing

and much more. None of which is provided by standard cut.

See also: https://stackoverflow.com/a/24543231/1296044

Source and documentation (free software): http://arielf.github.io/cuts/

金橙橙 2024-10-08 16:58:41

此 Perl 一行代码显示了 Perl 与 awk 的密切关系:

perl -lane 'print $F[3]' text.txt

但是,@F 自动分割数组从索引 $F[0] 开始,而 awk 字段以 开始>$1

This Perl one-liner shows how closely Perl is related to awk:

perl -lane 'print $F[3]' text.txt

However, the @F autosplit array starts at index $F[0] while awk fields start with $1

捂风挽笑 2024-10-08 16:58:41

据我所知,对于 cut 版本,这是不可能的。 cut 主要用于解析分隔符不是空格的文件(例如 /etc/passwd)并且具有固定数量的字段。连续两个分隔符意味着一个空字段,这也适用于空格。

With versions of cut I know of, no, this is not possible. cut is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.

你怎么敢 2024-10-08 16:58:41

我已经实现了 一个补丁,添加了新的 < code>-m cut(1) 的命令行选项,它在字段模式下工作,并将多个连续分隔符视为单个分隔符。这基本上以一种相当有效的方式解决了OP的问题,通过将多个空格视为cut(1)中的一个分隔符。

特别是,应用我的补丁后,以下命令将执行所需的操作。就这么简单,只需在命令行中添加 -m 即可:

cat text.txt | cut -d ' ' -m -f 4

我也在上游提交了这个补丁,希望它最终能被接受并合并到 coreutils 项目中。

关于添加更多空格,还有一些进一步的想法cut(1) 相关的功能,并且从不同的人那里获得一些反馈会很棒,最好是在 coreutils 邮件列表。我愿意为 cut(1) 实现更多补丁并将其提交到上游,这将使该实用程序在各种实际场景中更加通用且更可用。

I've implemented a patch that adds new -m command-line option to cut(1), which works in the field mode and treats multiple consecutive delimiters as a single delimiter. This basically solves the OP's question in a rather efficient way, by treating several spaces as one delimiter right within cut(1).

In particular, with my patch applied, the following command will perform the desired operation. It's as simple as that, just add -m into the command line:

cat text.txt | cut -d ' ' -m -f 4

I also submitted this patch upstream, and let's hope that it will eventually be accepted and merged into the coreutils project.

There are some further thoughts about adding even more whitespace-related features to cut(1), and having some feedback on all that from different people would be great, preferably on the coreutils mailing list. I'm willing to implement more patches for cut(1) and submit them upstream, which would make this utility more versatile and more usable in various real-world scenarios.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文