REGEX搜索以提取r中的bibtex标题字符串

发布于 2025-02-11 18:15:46 字数 771 浏览 2 评论 0原文

我在R中有一个数据框架，其中一个列，名为title，是一个看起来像这样的bibtex条目：

={Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems},\n  
author={Goldreich, Oded and Micali, Silvio and Wigderson, Avi},\n  
journal={Journal of the ACM (JACM)},\n  
volume={38},\n  
number={3},\n  
pages={690--728},\n  
year={1991},\n  
publisher={ACM New York, NY, USA}\n}

我只需要提取bibtex引用的标题，即= = {和在此示例中，下一个}之前

，输出应为：

Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems

我需要对数据框中的所有行进行此操作。并非所有行都具有相同数量的bibtex字段，因此第一个}

我当前尝试sub（“。*\\ = {\\} \ \ s*（。+？）\\ s*\\ |。 }'

我应该如何做？

原文

I have a data frame in R where one column, named Title, is a BibTeX entry that looks like this:

={Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems},\n  
author={Goldreich, Oded and Micali, Silvio and Wigderson, Avi},\n  
journal={Journal of the ACM (JACM)},\n  
volume={38},\n  
number={3},\n  
pages={690--728},\n  
year={1991},\n  
publisher={ACM New York, NY, USA}\n}

I need to extract only the title for the BibTeX citation, which is the string after ={ and before the next }

In this example, the output should be:

Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems

I need to do this for all rows in the data frame. Not all rows have the same number of BibTeX fields, so the regex has to ignore everything after the first }

I'm currently trying sub(".*\\={\\}\\s*(.+?)\\s*\\|.*$", "\\1", data$Title) and am met with TRE pattern compilation error 'Invalid contents of {}'

How should I do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蔚蓝源自深海 2025-02-18 18:15:46

使用stringr :: str_extract和lookaround的可能解决方案：

library(stringr)

str_extract(s, "(?<=\\{)[^}]+(?=\\})")

#> [1] "Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems"

A possible solution, using stringr::str_extract and lookaround:

library(stringr)

str_extract(s, "(?<=\\{)[^}]+(?=\\})")

#> [1] "Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems"

回复收藏 0 原文

寂寞美少年 2025-02-18 18:15:46

请注意，{ char是一种特殊的正则元时间，需要逃脱。

要匹配卷曲括号之间的任何字符串，您需要基于否定的字符类（否定的括号表达式），例如\ {（[^{}}]*）}。

您可以使用

sub(".*?=\\{([^{}]*)}.*", "\\1", df$Title)

regex demo 和 r demo ：

Title <- c("={Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems},\n  author={Goldreich, Oded and Micali, Silvio and Wigderson, Avi},\n  journal={Journal of the ACM (JACM)},\n  volume={38},\n  number={3},\n  pages={690--728},\n  year={1991},\n  publisher={ACM New York, NY, USA}\n}")
sub(".*?=\\{([^{}]*)}.*", "\\1", Title)

output：

[1] "Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems"

模式详细信息：

。*？？ - 任何零或更多chars，可能的
= \\ { - a = { substring
（[^{}]*） - 组1（\ 1 ）：除卷曲括号以外的任何零或更多字符
} - a } char（这不是特别的，无需逃脱）
。* - 字符串的其余部分。

Mind that the { char is a special regex metacharacter, it needs to be escaped.

To match any string between the curly braces, you need a negated character class (negated bracket expression) based pattern like \{([^{}]*)}.

You can use

sub(".*?=\\{([^{}]*)}.*", "\\1", df$Title)

See the regex demo and the R demo:

Title <- c("={Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems},\n  author={Goldreich, Oded and Micali, Silvio and Wigderson, Avi},\n  journal={Journal of the ACM (JACM)},\n  volume={38},\n  number={3},\n  pages={690--728},\n  year={1991},\n  publisher={ACM New York, NY, USA}\n}")
sub(".*?=\\{([^{}]*)}.*", "\\1", Title)

Output:

[1] "Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems"

Pattern details:

.*? - any zero or more chars, as few as possible
=\\{ - a ={ substring
([^{}]*) - Group 1 (\1): any zero or more chars other than curly braces
} - a } char (it is not special, no need to escape)
.* - the rest of the string.

回复收藏 0 原文

~没有更多了~

关于作者

子栖

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

REGEX搜索以提取r中的bibtex标题字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

REGEX搜索以提取r中的bibtex标题字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。