帮助使用正则表达式 - 提取文本

发布于 2024-10-19 20:26:04 字数 487 浏览 9 评论 0原文

假设我有一些文本文件（f1.txt，f2.txt，...），看起来像

@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}

我想提取 title 的内容并将其存储在 bash 变量中（称之为 $title），即“一些示例中的 {T}itle”。请注意，第一组大括号中可能有大括号。另外，“=”周围可能没有空格，并且“title”之前可能有更多空格。

非常感谢。我只需要一个如何提取此内容的工作示例，然后我就可以提取其他内容。

原文

Suppose I have some text files (f1.txt, f2.txt, ...) that looks something like

@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}

I want to extract the content of title and store it in a bash variable (call it $title), that is, "some {T}itle" in the example. Notice that there may be curly braces in the first set of braces. Also, there might not be white space around "=", and there may be more white spaces before "title".

Thanks so much. I just need a working example of how to extract this and I can extract the other stuff.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜心 2024-10-26 20:26:05

尝试一下：

title=$(sed -n '/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {s///; s/}[^}]*$//p}' inputfile)

解释：

/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ { - 如果一行匹配这个正则表达式
- s/// - 删除匹配部分
- s/}[^}]*$//p - 删除最后一个右大括号以及所有不是右大括号的字符，直到行尾并打印
} - end if

Give this a try:

title=$(sed -n '/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {s///; s/}[^}]*$//p}' inputfile)

Explanation:

/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ { - If a line matches this regex
- s/// - delete the matched portion
- s/}[^}]*$//p - delete the last closing curly brace and every character that's not a closing curly brace until the end of the line and print
} - end if

回复收藏 0 原文

兮颜 2024-10-26 20:26:05

title=$(sed -n '/title *=/{s/^[^{]*{\([^,]*\),.*$/\1/;s/} *$//p}' ./f1.txt)

/title *=/：仅对包含“title”一词并在任意数量的空格后跟随“=”的行进行操作
s/^[^{]*{$ [^,]*$,.*$/\1/：从行首开始查找第一个“{”字符。从那时起，保存您找到的所有内容，直到您遇到逗号“，”。将整行替换为您保存的所有内容
s/} *$//p：去掉尾部大括号 '}' 以及任何空格并打印结果。
title=$(sed -n ... )：将上述3步的结果保存在名为title的bash变量中

title=$(sed -n '/title *=/{s/^[^{]*{\([^,]*\),.*$/\1/;s/} *$//p}' ./f1.txt)

/title *=/: Only act upon lines which have the word 'title' followed by a '=' after an arbitrary number of spaces
s/^[^{]*{$[^,]*$,.*$/\1/: From the beginning of the line look for the first '{' character. From that point save everything you find until you hit a comma ','. Replace the entire line with everything you saved
s/} *$//p: strip off the trailing brace '}' along with any spaces and print the result.
title=$(sed -n ... ): save the result of the above 3 steps in the bash variable named title

回复收藏 0 原文

画尸师 2024-10-26 20:26:05

肯定有更优雅的方法，但在凌晨 2:40：

title=`cat test | grep "^\s*title\s*=\s*" | sed 's/^\s*title\s*=\s*{?//' | sed 's/}?\s*,\s*$//'`

Grep 查找我们感兴趣的行，删除所有内容，包括开头的卷曲，然后删除从最后一个卷曲到行尾的所有内容

There are definitely more elegant ways, but at 2:40AM:

title=`cat test | grep "^\s*title\s*=\s*" | sed 's/^\s*title\s*=\s*{?//' | sed 's/}?\s*,\s*$//'`

Grep for the line that interests us, strip everything up to and including the opening curly, then strip everything from the last curly to the end of the line

回复收藏 0 原文

~没有更多了~