帮助使用正则表达式 - 提取文本

发布于 2024-10-19 20:26:04 字数 487 浏览 3 评论 0原文

假设我有一些文本文件(f1.txt,f2.txt,...),看起来像

@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}

我想提取 title 的内容并将其存储在 bash 变量中(称之为 $title),即“一​​些示例中的 {T}itle”。请注意,第一组大括号中可能有大括号。另外,“=”周围可能没有空格,并且“title”之前可能有更多空格。

非常感谢。我只需要一个如何提取此内容的工作示例,然后我就可以提取其他内容。

Suppose I have some text files (f1.txt, f2.txt, ...) that looks something like

@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}

I want to extract the content of title and store it in a bash variable (call it $title), that is, "some {T}itle" in the example. Notice that there may be curly braces in the first set of braces. Also, there might not be white space around "=", and there may be more white spaces before "title".

Thanks so much. I just need a working example of how to extract this and I can extract the other stuff.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

甜心 2024-10-26 20:26:05

尝试一下:

title=$(sed -n '/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {s///; s/}[^}]*$//p}' inputfile)

解释:

  • /^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ { - 如果一行匹配这个正则表达式
    • s/// - 删除匹配部分
    • s/}[^}]*$//p - 删除最后一个右大括号以及所有不是右大括号的字符,直到行尾并打印
  • } - end if

Give this a try:

title=$(sed -n '/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {s///; s/}[^}]*$//p}' inputfile)

Explanation:

  • /^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ { - If a line matches this regex
    • s/// - delete the matched portion
    • s/}[^}]*$//p - delete the last closing curly brace and every character that's not a closing curly brace until the end of the line and print
  • } - end if
兮颜 2024-10-26 20:26:05
title=$(sed -n '/title *=/{s/^[^{]*{\([^,]*\),.*$/\1/;s/} *$//p}' ./f1.txt)
  1. /title *=/:仅对包含“title”一词并在任意数量的空格后跟随“=”的行进行操作
  2. s/^[^{]*{\( [^,]*\),.*$/\1/:从行首开始查找第一个“{”字符。从那时起,保存您找到的所有内容,直到您遇到逗号“,”。将整行替换为您保存的所有内容
  3. s/} *$//p:去掉尾部大括号 '}' 以及任何空格并打印结果。
  4. title=$(sed -n ... ):将上述3步的结果保存在名为title的bash变量中
title=$(sed -n '/title *=/{s/^[^{]*{\([^,]*\),.*$/\1/;s/} *$//p}' ./f1.txt)
  1. /title *=/: Only act upon lines which have the word 'title' followed by a '=' after an arbitrary number of spaces
  2. s/^[^{]*{\([^,]*\),.*$/\1/: From the beginning of the line look for the first '{' character. From that point save everything you find until you hit a comma ','. Replace the entire line with everything you saved
  3. s/} *$//p: strip off the trailing brace '}' along with any spaces and print the result.
  4. title=$(sed -n ... ): save the result of the above 3 steps in the bash variable named title
画尸师 2024-10-26 20:26:05

肯定有更优雅的方法,但在凌晨 2:40:

title=`cat test | grep "^\s*title\s*=\s*" | sed 's/^\s*title\s*=\s*{?//' | sed 's/}?\s*,\s*$//'`

Grep 查找我们感兴趣的行,删除所有内容,包括开头的卷曲,然后删除从最后一个卷曲到行尾的所有内容

There are definitely more elegant ways, but at 2:40AM:

title=`cat test | grep "^\s*title\s*=\s*" | sed 's/^\s*title\s*=\s*{?//' | sed 's/}?\s*,\s*$//'`

Grep for the line that interests us, strip everything up to and including the opening curly, then strip everything from the last curly to the end of the line

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文