提取下划线和点之间的字符串

发布于 2025-01-10 15:25:37 字数 416 浏览 0 评论 0原文

我有这样的字符串：

/my/directory/file1_AAA_123_k.txt 
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

所以基本上，下划线的数量不是固定的。我想提取第一个下划线和点之间的字符串。所以输出应该是这样的：

AAA_123_k
CCC
KK_45

我发现这个解决方案有效：

string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed  's/^[^_:]*[_:]//'

但我想知道是否有一个更“优雅”的解决方案（例如 1 行代码）。

原文

I have strings like these:

/my/directory/file1_AAA_123_k.txt 
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

So basically, the number of underscores is not fixed. I would like to extract the string between the first underscore and the dot. So the output should be something like this:

AAA_123_k
CCC
KK_45

I found this solution that works:

string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed  's/^[^_:]*[_:]//'

But I am wondering if there is a more 'elegant' solution (e.g. 1 line code).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌吟 2025-01-17 15:25:37

使用 bash 版本 >= 3.0 和正则表达式：

[[ "$string" =~ _(.+)\. ]] && echo "${BASH_REMATCH[1]}"

With bash version >= 3.0 and a regex:

[[ "$string" =~ _(.+)\. ]] && echo "${BASH_REMATCH[1]}"

回复收藏 0 原文

∞梦里开花 2025-01-17 15:25:37

您可以使用单个 sed 命令，例如

sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"

查看在线演示。 详细信息：

^ - 字符串开头
.* - 任何文本
/ - a / char
[^_/]* - 除 / 和 _ 之外的零个或多个字符
_ - a <代码>_字符
$[^/]*$ (POSIX BRE) / ([^/]*) (POSIX ERE，通过 E 选项启用) - 第 1 组：除 / 之外的任何零个或多个字符
\. - 点
[^./]* - 零个或多个字符除了 . 和/
$ - 字符串结尾。

使用-n，默认行输出被抑制，p仅打印成功替换的结果。

You can use a single sed command like

sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"

See the online demo. Details:

^ - start of string
.* - any text
/ - a / char
[^_/]* - zero or more chars other than / and _
_ - a _ char
$[^/]*$ (POSIX BRE) / ([^/]*) (POSIX ERE, enabled with E option) - Group 1: any zero or more chars other than /
\. - a dot
[^./]* - zero or more chars other than . and /
$ - end of string.

With -n, default line output is suppressed and p only prints the result of successful substitution.

回复收藏 0 原文

遮云壑 2025-01-17 15:25:37

使用 sed

$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45

Using sed

$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45

回复收藏 0 原文

如若梦似彩虹 2025-01-17 15:25:37

根据您显示的示例，使用 GNU grep 您可以尝试以下代码。

grep -oP '.*?_\K([^.]*)' Input_file

说明：使用 GNU grep 的 -oP 选项分别打印精确匹配和启用 PCRE 正则表达式。在主程序中，使用正则表达式 .*?_\K([^.]*) 获取第一个 _ 和第一次出现 . 之间的值。正则表达式的解释如下：

正则表达式的解释：

.*?_     ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K       ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*)  ##Matching everything till first occurrence of dot as per need.

With your shown samples, with GNU grep you could try following code.

grep -oP '.*?_\K([^.]*)' Input_file

Explanation: Using GNU grep's -oP options here to print exact match and to enable PCRE regex respectively. In main program using regex .*?_\K([^.]*) to get value between 1st _ and first occurrence of .. Explanation of regex is as follows:

Explanation of regex:

.*?_     ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K       ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*)  ##Matching everything till first occurrence of dot as per need.

回复收藏 0 原文

冷了相思 2025-01-17 15:25:37

一个更简单的 sed 解决方案，没有任何捕获组：

sed -E 's/^[^_]*_|\.[^.]*$//g' file

AAA_123_k
CCC
KK_45

A simpler sed solution without any capturing group:

sed -E 's/^[^_]*_|\.[^.]*$//g' file

AAA_123_k
CCC
KK_45

回复收藏 0 原文

千里故人稀 2025-01-17 15:25:37

如果您需要一次处理一个文件名（例如，在 while read 循环内），您可以执行两个参数扩展，例如：

$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k

同时解析文件名列表的一个想法:

$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

$ sed -En 's/[^_]*_([^.]+).*/\1/p' file.list
AAA_123_k
CCC
KK_45

If you need to process the file names one at a time (eg, within a while read loop) you can perform two parameter expansions, eg:

$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k

One idea to parse a list of file names at the same time:

$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

$ sed -En 's/[^_]*_([^.]+).*/\1/p' file.list
AAA_123_k
CCC
KK_45

回复收藏 0 原文