帮助使用正则表达式 - 提取文本
假设我有一些文本文件(f1.txt,f2.txt,...),看起来像
@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}
我想提取 title 的内容并将其存储在 bash 变量中(称之为 $title),即“一些示例中的 {T}itle”。请注意,第一组大括号中可能有大括号。另外,“=”周围可能没有空格,并且“title”之前可能有更多空格。
非常感谢。我只需要一个如何提取此内容的工作示例,然后我就可以提取其他内容。
Suppose I have some text files (f1.txt, f2.txt, ...) that looks something like
@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}
I want to extract the content of title and store it in a bash variable (call it $title), that is, "some {T}itle" in the example. Notice that there may be curly braces in the first set of braces. Also, there might not be white space around "=", and there may be more white spaces before "title".
Thanks so much. I just need a working example of how to extract this and I can extract the other stuff.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尝试一下:
解释:
/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {
- 如果一行匹配这个正则表达式s///
- 删除匹配部分s/}[^}]*$//p
- 删除最后一个右大括号以及所有不是右大括号的字符,直到行尾并打印}
- end ifGive this a try:
Explanation:
/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {
- If a line matches this regexs///
- delete the matched portions/}[^}]*$//p
- delete the last closing curly brace and every character that's not a closing curly brace until the end of the line and print}
- end if/title *=/
:仅对包含“title”一词并在任意数量的空格后跟随“=”的行进行操作s/^[^{]*{\( [^,]*\),.*$/\1/
:从行首开始查找第一个“{”字符。从那时起,保存您找到的所有内容,直到您遇到逗号“,”。将整行替换为您保存的所有内容s/} *$//p
:去掉尾部大括号 '}' 以及任何空格并打印结果。title=$(sed -n ... )
:将上述3步的结果保存在名为title
的bash变量中/title *=/
: Only act upon lines which have the word 'title' followed by a '=' after an arbitrary number of spacess/^[^{]*{\([^,]*\),.*$/\1/
: From the beginning of the line look for the first '{' character. From that point save everything you find until you hit a comma ','. Replace the entire line with everything you saveds/} *$//p
: strip off the trailing brace '}' along with any spaces and print the result.title=$(sed -n ... )
: save the result of the above 3 steps in the bash variable namedtitle
肯定有更优雅的方法,但在凌晨 2:40:
Grep 查找我们感兴趣的行,删除所有内容,包括开头的卷曲,然后删除从最后一个卷曲到行尾的所有内容
There are definitely more elegant ways, but at 2:40AM:
Grep for the line that interests us, strip everything up to and including the opening curly, then strip everything from the last curly to the end of the line