当前位置：文江博客话题详情

使用空格作为 cut 命令的分隔符

发布于 2024-07-19 08:06:12 字数 64 浏览 13 评论 0原文

我想在 cut 命令中使用空格作为分隔符。

我可以为此使用什么语法？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

醉殇 2024-07-26 08:06:13

cut -d ' ' -f 2

其中 2 是所需的空格分隔字段的字段编号。

cut -d ' ' -f 2

Where 2 is the field number of the space-delimited field you want.

回复收藏 0 原文

℡寂寞咖啡 2024-07-26 08:06:13

通常，如果您使用空格作为分隔符，您希望将多个空格视为一个空格，因为您会解析将某些列与空格对齐的命令的输出。（谷歌搜索引导我到这里）

在这种情况下，单个 cut 命令是不够的，您需要使用：

tr -s ' ' | cut -d ' ' -f 2

或者

awk '{print $2}'

这有效，因为 AWK 的默认输入字段分隔符是一个或多个空白字符; 用正则表达式来说，它类似于 [ \t]+。 AWK 解决方案的另一个好处是可以透明地处理数据行上的前导/尾随空格，而 tr + cut 解决方案则不然。

Usually if you use space as delimiter, you want to treat multiple spaces as one, because you parse the output of a command aligning some columns with spaces. (and the google search for that lead me here)

In this case a single cut command is not sufficient, and you need to use:

tr -s ' ' | cut -d ' ' -f 2

awk '{print $2}'

This works because AWK's default input field separator is one or more whitespace characters; in regex terms, it's something like [ \t]+. The AWK solution has the added benefit of transparently handling leading/trailing spaces on the data row, whereas the tr + cut solution does not.

回复收藏 0 原文

默嘫て 2024-07-26 08:06:13

^{补充现有的、有用的答案；向 QZ 支持致敬，感谢他们鼓励我发布单独的答案：}

两个不同的机制在这里发挥作用：

(a) cut 本身是否需要将分隔符（在本例中为空格）传递给 -d 选项作为单独的参数，或者是否可以将其直接附加到-d。
(b) shell 在将参数传递给正在调用的命令之前通常如何解析参数。

(a) 的答案来自实用程序的 POSIX 指南（强调我的）

如果标准实用程序的概要显示带有强制选项参数的选项[...]，符合要求的应用程序应为此使用单独参数选项及其选项参数。但是，符合要求的实现应还允许应用程序在同一参数字符串中指定选项和选项参数，而无需插入字符。

换句话说：在这种情况下，因为 -d 的选项参数是强制，您可以选择< /em> 是否将分隔符指定为</strong>：

(s) EITHER：单独参数
(d) OR：作为值直接附加到-d。

^{注意：默认情况下，在许多 Linux 发行版上都可以找到 cut 的 GNU 实现，它支持 --delimiter 作为更具描述性的别名-d。与 -d 的注意事项相同，只是直接附加选项参数需要使用 = 作为分隔符，而 no 分隔符是与-d一起使用；例如，
回显“一二”| cut --delimiter=' ' -f 1 与 echo '一二' | cut -d' ' -f 1}

一旦你选择了 (s) 或 (d)，它就是 shell 的字符串文字解析 - (b) - 重要的是：

采用方法，以下所有形式都是等效的：
- -d ' '
- -d“”
- -d \ （\ 是按字面意思使用的转义空格）
使用方法 (d)，以下所有形式都是等效的：< /p>
-d' '
-d" "
"-d "
'-d'
d\

等价性由 shell 的字符串文字处理来解释：

上述所有解决方案都会产生完全相同的字符串（在每个组中）当 cut 看到它们时：

(s): cut 看到 -d，作为其自己的参数，后跟一个包含空格字符的单独参数 - 然后不带引号或\前缀！。
(d)：cut 看到 -d 加上一个空格字符 - 然后不带引号或 \ 前缀！ - 作为相同论点的一部分。

各个组中的形式最终相同的原因有两个，基于 shell 如何解析字符串文字：

shell 允许文字通过称为引用的机制按原样指定，该机制可以采用多种形式：
- 单引号字符串：'...' 中的内容按字面意思获取并形成单个 > 论证
- 双引号字符串："..." 内的内容也形成单参数，但受插值（扩展变量引用，例如 $var、命令替换（$(...) 或 `...` )，或算术展开式 ($(( ... )))。
- \ - 引用单个字符：单个字符前面的 \ 会导致该字符被解释作为字面意思。
引用由 引号删除，这意味着一旦 shell 解析了命令行，它就会删除参数中的引号字符 （任何封闭的 '...' 或 "..." 或不带引号的 \ 实例） - 因此，被调用的命令永远不会看到引号字符。

^{To complement the existing, helpful answers; tip of the hat to QZ Support for encouraging me to post a separate answer:}

Two distinct mechanisms come into play here:

(a) whether cut itself requires the delimiter (space, in this case) passed to the -d option to be a separate argument or whether it's acceptable to append it directly to -d.
(b) how the shell generally parses arguments before passing them to the command being invoked.

(a) is answered by a quote from the POSIX guidelines for utilities (emphasis mine)

If the SYNOPSIS of a standard utility shows an option with a mandatory option-argument [...] a conforming application shall use separate arguments for that option and its option-argument. However, a conforming implementation shall also permit applications to specify the option and option-argument in the same argument string without intervening characters.

In other words: In this case, because -d's option-argument is mandatory, you can choose whether to specify the delimiter as:

(s) EITHER: a separate argument
(d) OR: as a value directly attached to -d.

^{Note: The GNU implementation of cut, as found on many Linux distros by default, supports --delimiter as a more descriptive alias of -d. The same considerations apply as for -d, except that directly attaching the option-argument requires use of = as the separator, whereas no separator is used with -d; e.g.,
echo 'one two' | cut --delimiter=' ' -f 1 vs. echo 'one two' | cut -d' ' -f 1}

Once you've chosen (s) or (d), it is the shell's string-literal parsing - (b) - that matters:

With approach (s), all of the following forms are EQUIVALENT:
- -d ' '
- -d " "
- -d \ (\<space> is an escaped space to be used literally)
With approach (d), all of the following forms are EQUIVALENT:
-d' '
-d" "
"-d "
'-d '
d\

The equivalence is explained by the shell's string-literal processing:

All solutions above result in the exact same string (in each group) by the time cut sees them:

(s): cut sees -d, as its own argument, followed by a separate argument that contains a space char - then without quotes or \ prefix!.
(d): cut sees -d plus a space char - then without quotes or \ prefix! - as part of the same argument.

The reason the forms in the respective groups are ultimately identical is twofold, based on how the shell parses string literals:

The shell allows literal to be specified as is through a mechanism called quoting, which can take several forms:
- single-quoted strings: the contents inside '...' is taken literally and forms a single argument
- double-quoted strings: the contents inside "..." also forms a single argument, but is subject to interpolation (expands variable references such as $var, command substitutions ($(...) or `...`), or arithmetic expansions ($(( ... ))).
- \-quoting of individual characters: a \ preceding a single character causes that character to be interpreted as a literal.
Quoting is complemented by quote removal, which means that once the shell has parsed a command line, it removes the quote characters from the arguments (any enclosing '...' or "..." or unquoted \ instances) - thus, the command being invoked never sees the quote characters.

回复收藏 0 原文

幼儿园老大 2024-07-26 08:06:13

你也可以说：

cut -d\  -f 2

注意反斜杠后面有两个空格。

You can also say:

cut -d\  -f 2

Note that there are two spaces after the backslash.

回复收藏 0 原文

迷乱花海 2024-07-26 08:06:13

我刚刚发现你也可以使用“-d”：

cut "-d "

测试

$ cat a
hello how are you
I am fine
$ cut "-d " -f2 a
how
am

I just discovered that you can also use "-d ":

cut "-d "

Test

$ cat a
hello how are you
I am fine
$ cut "-d " -f2 a
how
am

回复收藏 0 原文

暗地喜欢 2024-07-26 08:06:13

例如，如果数据具有多个空格，则无法使用 cut 轻松完成此操作。我发现标准化输入以便于处理很有用。一个技巧是使用 sed 进行标准化，如下所示。

echo -e "foor\t \t bar" | sed 's:\s\+:\t:g' | cut -f2  #bar

You can't do it easily with cut if the data has for example multiple spaces. I have found it useful to normalize input for easier processing. One trick is to use sed for normalization as below.

echo -e "foor\t \t bar" | sed 's:\s\+:\t:g' | cut -f2  #bar

回复收藏 0 原文

一枫情书 2024-07-26 08:06:13

scut，一个类似剪切的实用程序（我制作的更智能但更慢）可以使用任何 Perl 正则表达式作为中断标记。默认情况下会在空格上中断，但您也可以在多字符正则表达式、替代正则表达式等上中断。

scut -f='6 2 8 7' < input.file  > output.file

因此上述命令将在空格上中断列并按该顺序提取（基于 0 的）列 6 2 8 7。

scut, a cut-like utility (smarter but slower I made) that can use any perl regex as a breaking token. Breaking on whitespace is the default, but you can also break on multi-char regexes, alternative regexes, etc.

scut -f='6 2 8 7' < input.file  > output.file

so the above command would break columns on whitespace and extract the (0-based) cols 6 2 8 7 in that order.

回复收藏 0 原文

请帮我爱他 2024-07-26 08:06:13

我有一个答案（我承认有些令人困惑的答案），涉及 sed、正则表达式和捕获组：

\S* - 第一个单词
\s* - 分隔符
(\S*) - 第二个单词 - 捕获
.* - 行的其余部分

作为 sed 表达式，捕获组需要要转义，即 $ 和 $。

\1 返回捕获组的副本，即第二个单词。

$ echo "alpha beta gamma delta" | sed 's/\S*\s*\(\S*\).*/\1/'
beta

当你看到这个答案时，它有点令人困惑，你可能会想，为什么要麻烦呢？好吧，我希望有些人会“啊哈！” 并将使用此模式通过单个 sed 表达式解决一些复杂的文本提取问题。

I have an answer (I admit somewhat confusing answer) that involvessed, regular expressions and capture groups:

\S* - first word
\s* - delimiter
(\S*) - second word - captured
.* - rest of the line

As a sed expression, the capture group needs to be escaped, i.e. $ and $.

The \1 returns a copy of the captured group, i.e. the second word.

$ echo "alpha beta gamma delta" | sed 's/\S*\s*\(\S*\).*/\1/'
beta

When you look at this answer, its somewhat confusing, and, you may think, why bother? Well, I'm hoping that some, may go "Aha!" and will use this pattern to solve some complex text extraction problems with a single sed expression.

回复收藏 0 原文

~没有更多了~