有没有一种方法可以使用带有空格作为分隔符的 cut 命令,并将像 Costa Rica 这样带有空格的单词视为单个单词?

发布于 2025-01-17 06:00:18 字数 286 浏览 3 评论 0原文

我使用以下输入创建了此文件 concacaf.txt

David Canada 5
拉林 加拿大 5
博尔赫斯哥斯达黎加 2
布坎南加拿大 2
戴维斯 巴拿马 2
灰色牙买加 2
Henriquez El Salvador 2

有没有一种方法可以使用 cut 命令并将 Costa Rica 或 El Salvador 视为单个单词或修改文本,以便当我使用时: cut -f 1,3 -d ' ' concacaf.txt 我得到的是“博尔赫斯 2”而不是“博尔赫斯·里卡”。谢谢

I have created this file concacaf.txt with the following input

David Canada 5
Larin Canada 5
Borges Costa Rica 2
Buchanan Canada 2
Davis Panama 2
Gray Jamaica 2
Henriquez El Salvador 2

Is there a way that I can either use the cut command and treat Costa Rica or El Salvador as a single word or modify the text so that when I use:
cut -f 1,3 -d ' ' concacaf.txt
I get 'Borges 2' instead of 'Borges Rica'. Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

拧巴小姐 2025-01-24 06:00:18

不可能使用 cut,但可以使用 sed

sed -E 's/^([^ ]*) .* ([^ ]*)$/\1 \2/' concacaf.txt

它搜索第一个单词([^]*,一个非-空格字符)位于行首和行尾的单词,并将整行替换为第一个单词和最后一个单词以及它们之间的空格。

选项-E告诉sed使用现代正则表达式(默认情况下它使用基本正则表达式并且括号需要转义)。

sed 命令是 s(搜索)。它使用正则表达式在每一行中进行搜索,并用提供的替换字符串替换匹配的子字符串。在替换字符串中,\1 表示与第一个捕获组匹配的子字符串,\2 表示与第二个捕获组匹配的子字符串,依此类推。

正则表达式解释如下:

^             # matches the beginning of line
(             # starts a group (it is not a matcher)
  [^ ]        # matches any character that is not a space (there is a space after `^`)
  *           # the previous sub-expression, zero or more times
)             # close the group; the matched substring is captured
              # there is a space here in the expression; it matches a space
.*            # match any character, any number of times
              # match a space
([^ ]*)       # another group that matches a sequence of non-space characters
$             # match the end of the line

It is not possible using cut but it is possible using sed:

sed -E 's/^([^ ]*) .* ([^ ]*)$/\1 \2/' concacaf.txt

It searches for the first word ([^ ]*, a sequence of non-space characters) at the beginning of the line and the word at the end of the line and replaces the entire line with the first word and the last word and a space between them.

The option -E tells sed to use modern regular expressions (by default it uses basic regular expressions and the parentheses need to be escaped).

The sed command is s (search). It searches in each line using a regular expression and replaces the matching substring with the provided replacement string. In the replacement string, \1 represents the substring matching the first capturing group, \2 the second group and so on.

The regular expression is explained below:

^             # matches the beginning of line
(             # starts a group (it is not a matcher)
  [^ ]        # matches any character that is not a space (there is a space after `^`)
  *           # the previous sub-expression, zero or more times
)             # close the group; the matched substring is captured
              # there is a space here in the expression; it matches a space
.*            # match any character, any number of times
              # match a space
([^ ]*)       # another group that matches a sequence of non-space characters
$             # match the end of the line

暖阳 2025-01-24 06:00:18

您可以使用 rev 删除包含整数的最后一个字段:

$ cat concacaf.txt | rev | cut -d' ' -f2- | rev
David Canada
Larin Canada
Borges Costa Rica
Buchanan Canada
Davis Panama
Gray Jamaica
Henriquez El Salvador

You can use rev to cut out that last field containing the integer:

$ cat concacaf.txt | rev | cut -d' ' -f2- | rev
David Canada
Larin Canada
Borges Costa Rica
Buchanan Canada
Davis Panama
Gray Jamaica
Henriquez El Salvador
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文