检查系列的哪些元素是给定文本的子字符串 - Python、Pandas
我想做的是:
给定一个字符串序列,找到字符串的所有索引, 它们是另一个主字符串的子字符串,以矢量化方式。
输入:
series = pd.Series(['ab', 'abcd', 'bcc', 'abc'], name='text')
main_text = 'abcX'
# The series:
0 ab
1 abcd
2 bcc
3 abc
Name: text, dtype: object
所需的输出:
0 ab
3 abc
Name: text, dtype: object
我尝试过的:
df_test = pd.DataFrame(series)
df_test['text2'] = main_text
df_test['text'].isin(df_test)
# And this of course won't work, since it check if the main string is a
# substring of the series strings:
series.str.contains(main_text, regex=True)
谢谢!
What I am trying to do is:
Given a series with strings, to find all the indexes of the strings,
that are substring of another main string, in a vectorize manner.
The Input:
series = pd.Series(['ab', 'abcd', 'bcc', 'abc'], name='text')
main_text = 'abcX'
# The series:
0 ab
1 abcd
2 bcc
3 abc
Name: text, dtype: object
The desired output:
0 ab
3 abc
Name: text, dtype: object
What I tried:
df_test = pd.DataFrame(series)
df_test['text2'] = main_text
df_test['text'].isin(df_test)
# And this of course won't work, since it check if the main string is a
# substring of the series strings:
series.str.contains(main_text, regex=True)
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您不需要正则表达式,只需使用
in
:输出:
You don't need a regex, simply use
in
:output: