正则表达式捕获Python panda系列中的整个字符串
我有一个示例系列:
s = pd.Series(['Complexity Level 1', 'RandomName', 'I-Invoice Submission test', 'I-test2', 'I-string with multiple words'])
我试图仅捕获以“I-”开头的字符串。使用提取
。
extract1 = s.str.extract(r'I-(\w+)')
当前输出:
0
0 NaN
1 NaN
2 Invoice
3 test2
4 string
当前仅提取第一个单词。但我想要标识符后面的所有单词和空格。最多可以有 5 个单词
这是正则表达式调整还是有更好的方法?
我想要的是:
0
0 NaN
1 NaN
2 Invoice Submission test
3 test2
4 string with multiple words
I have a sample series:
s = pd.Series(['Complexity Level 1', 'RandomName', 'I-Invoice Submission test', 'I-test2', 'I-string with multiple words'])
I'm trying to capture only strings that begin with "I-". Using extract
.
extract1 = s.str.extract(r'I-(\w+)')
Current Output:
0
0 NaN
1 NaN
2 Invoice
3 test2
4 string
It's currently only extracting the first word. But I want all words and white space after the identifier. This could be up to 5 words
Is this a regex adjustment or is there a better method?
What I want is:
0
0 NaN
1 NaN
2 Invoice Submission test
3 test2
4 string with multiple words
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
完成这项工作的正则表达式是
r'I-(.*)'
?。含义:捕获"|-"
之后的任何字符(直到换行符)。编辑(来自评论):
要捕获逗号之前的任何字符,请使用
I-([^,]*)
。含义:捕获"|-"
之后除逗号 (,
) 之外的任何字符。The regex that will do the job is
r'I-(.*)'
?. Meaning: capture any character (until a newline) after"|-"
.EDIT (From comments):
To capture any character up until a comma use
I-([^,]*)
. Meaning: capture any character that is not a comma (,
) after"|-"
.