提取下划线和点之间的字符串

发布于 2025-01-10 15:25:37 字数 416 浏览 0 评论 0原文

我有这样的字符串:

/my/directory/file1_AAA_123_k.txt 
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

所以基本上,下划线的数量不是固定的。我想提取第一个下划线和点之间的字符串。所以输出应该是这样的:

AAA_123_k
CCC
KK_45

我发现这个解决方案有效:

string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed  's/^[^_:]*[_:]//'

但我想知道是否有一个更“优雅”的解决方案(例如 1 行代码)。

I have strings like these:

/my/directory/file1_AAA_123_k.txt 
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

So basically, the number of underscores is not fixed. I would like to extract the string between the first underscore and the dot. So the output should be something like this:

AAA_123_k
CCC
KK_45

I found this solution that works:

string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed  's/^[^_:]*[_:]//'

But I am wondering if there is a more 'elegant' solution (e.g. 1 line code).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

萌吟 2025-01-17 15:25:37

使用 bash 版本 >= 3.0 和正则表达式:

[[ "$string" =~ _(.+)\. ]] && echo "${BASH_REMATCH[1]}"

With bash version >= 3.0 and a regex:

[[ "$string" =~ _(.+)\. ]] && echo "${BASH_REMATCH[1]}"
∞梦里开花 2025-01-17 15:25:37

您可以使用单个 sed 命令,例如

sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"

查看在线演示详细信息

  • ^ - 字符串开头
  • .* - 任何文本
  • / - a / char
  • [^_/]* - 除 /_ 之外的零个或多个字符
  • _ - a <代码>_字符
  • \([^/]*\) (POSIX BRE) / ([^/]*) (POSIX ERE,通过 E 选项启用) - 第 1 组:除 / 之外的任何零个或多个字符
  • \. - 点
  • [^./]* - 零个或多个字符除了 ./
  • $ - 字符串结尾。

使用-n,默认行输出被抑制,p仅打印成功替换的结果。

You can use a single sed command like

sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"

See the online demo. Details:

  • ^ - start of string
  • .* - any text
  • / - a / char
  • [^_/]* - zero or more chars other than / and _
  • _ - a _ char
  • \([^/]*\) (POSIX BRE) / ([^/]*) (POSIX ERE, enabled with E option) - Group 1: any zero or more chars other than /
  • \. - a dot
  • [^./]* - zero or more chars other than . and /
  • $ - end of string.

With -n, default line output is suppressed and p only prints the result of successful substitution.

遮云壑 2025-01-17 15:25:37

使用 sed

$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45

Using sed

$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45
如若梦似彩虹 2025-01-17 15:25:37

根据您显示的示例,使用 GNU grep 您可以尝试以下代码。

grep -oP '.*?_\K([^.]*)' Input_file

说明:使用 GNU grep-oP 选项分别打印精确匹配和启用 PCRE 正则表达式。在主程序中,使用正则表达式 .*?_\K([^.]*) 获取第一个 _ 和第一次出现 . 之间的值。正则表达式的解释如下:

正则表达式的解释:

.*?_     ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K       ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*)  ##Matching everything till first occurrence of dot as per need.

With your shown samples, with GNU grep you could try following code.

grep -oP '.*?_\K([^.]*)' Input_file

Explanation: Using GNU grep's -oP options here to print exact match and to enable PCRE regex respectively. In main program using regex .*?_\K([^.]*) to get value between 1st _ and first occurrence of .. Explanation of regex is as follows:

Explanation of regex:

.*?_     ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K       ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*)  ##Matching everything till first occurrence of dot as per need.
冷了相思 2025-01-17 15:25:37

一个更简单的 sed 解决方案,没有任何捕获组:

sed -E 's/^[^_]*_|\.[^.]*$//g' file

AAA_123_k
CCC
KK_45

A simpler sed solution without any capturing group:

sed -E 's/^[^_]*_|\.[^.]*$//g' file

AAA_123_k
CCC
KK_45
千里故人稀 2025-01-17 15:25:37

如果您需要一次处理一个文件名(例如,在 while read 循环内),您可以执行两个参数扩展,例如:

$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k

同时解析文件名列表的一个想法:

$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

$ sed -En 's/[^_]*_([^.]+).*/\1/p' file.list
AAA_123_k
CCC
KK_45

If you need to process the file names one at a time (eg, within a while read loop) you can perform two parameter expansions, eg:

$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k

One idea to parse a list of file names at the same time:

$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

$ sed -En 's/[^_]*_([^.]+).*/\1/p' file.list
AAA_123_k
CCC
KK_45
蓝咒 2025-01-17 15:25:37

这很简单,只是它包含开头的下划线:

ls | grep -o "_[^.]*"

This is easy, except that it includes the initial underscore:

ls | grep -o "_[^.]*"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文