有没有一种方法可以使用带有空格作为分隔符的 cut 命令，并将像 Costa Rica 这样带有空格的单词视为单个单词？

发布于 2025-01-17 06:00:18 字数 286 浏览 3 评论 0原文

我使用以下输入创建了此文件 concacaf.txt

David Canada 5
拉林加拿大 5
博尔赫斯哥斯达黎加 2
布坎南加拿大 2
戴维斯巴拿马 2
灰色牙买加 2
Henriquez El Salvador 2

有没有一种方法可以使用 cut 命令并将 Costa Rica 或 El Salvador 视为单个单词或修改文本，以便当我使用时： cut -f 1,3 -d ' ' concacaf.txt 我得到的是“博尔赫斯 2”而不是“博尔赫斯·里卡”。谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拧巴小姐 2025-01-24 06:00:18

不可能使用 cut，但可以使用 sed：

sed -E 's/^([^ ]*) .* ([^ ]*)$/\1 \2/' concacaf.txt

它搜索第一个单词（[^]*，一个非-空格字符）位于行首和行尾的单词，并将整行替换为第一个单词和最后一个单词以及它们之间的空格。

选项-E告诉sed使用现代正则表达式（默认情况下它使用基本正则表达式并且括号需要转义）。

sed 命令是 s（搜索）。它使用正则表达式在每一行中进行搜索，并用提供的替换字符串替换匹配的子字符串。在替换字符串中，\1 表示与第一个捕获组匹配的子字符串，\2 表示与第二个捕获组匹配的子字符串，依此类推。

正则表达式解释如下：

^             # matches the beginning of line
(             # starts a group (it is not a matcher)
  [^ ]        # matches any character that is not a space (there is a space after `^`)
  *           # the previous sub-expression, zero or more times
)             # close the group; the matched substring is captured
              # there is a space here in the expression; it matches a space
.*            # match any character, any number of times
              # match a space
([^ ]*)       # another group that matches a sequence of non-space characters
$             # match the end of the line

It is not possible using cut but it is possible using sed:

sed -E 's/^([^ ]*) .* ([^ ]*)$/\1 \2/' concacaf.txt

It searches for the first word ([^ ]*, a sequence of non-space characters) at the beginning of the line and the word at the end of the line and replaces the entire line with the first word and the last word and a space between them.

The option -E tells sed to use modern regular expressions (by default it uses basic regular expressions and the parentheses need to be escaped).

The sed command is s (search). It searches in each line using a regular expression and replaces the matching substring with the provided replacement string. In the replacement string, \1 represents the substring matching the first capturing group, \2 the second group and so on.

The regular expression is explained below:

^             # matches the beginning of line
(             # starts a group (it is not a matcher)
  [^ ]        # matches any character that is not a space (there is a space after `^`)
  *           # the previous sub-expression, zero or more times
)             # close the group; the matched substring is captured
              # there is a space here in the expression; it matches a space
.*            # match any character, any number of times
              # match a space
([^ ]*)       # another group that matches a sequence of non-space characters
$             # match the end of the line

回复收藏 0 原文

暖阳 2025-01-24 06:00:18

您可以使用 rev 删除包含整数的最后一个字段：

$ cat concacaf.txt | rev | cut -d' ' -f2- | rev
David Canada
Larin Canada
Borges Costa Rica
Buchanan Canada
Davis Panama
Gray Jamaica
Henriquez El Salvador

You can use rev to cut out that last field containing the integer:

$ cat concacaf.txt | rev | cut -d' ' -f2- | rev
David Canada
Larin Canada
Borges Costa Rica
Buchanan Canada
Davis Panama
Gray Jamaica
Henriquez El Salvador

回复收藏 0 原文

~没有更多了~