python“ in”操作员没有在文本中找到子字符串
我试图发现子字符串列表中的任何子字符串是否在给定的字符串中。为此,我在列表的项目上循环,并使用Python的运算符中检查它们是否存在于字符串中。即使我确定其中一个子字符串存在于字符串中,我也会得到错误的值。我已经尝试了所有旨在统一文本和子字符串的方法:用“”替换所有“”,使用casefold()
方法,strip()
,甚至使用代码> unidecode 。尽管如此,找不到子字符串。
我的代码:
from unidecode import unidecode
example_string = '''available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/nanotoday
REVIEW
Synthesis, properties and applications of Janus
nanoparticles
Marco Lattuada a, T. Alan Hatton b,''' # as extracted from PDF file using fitz's `doc.load_page(0)` and then `.get_text()`
list_of_titles = ["Synthesis, properties and applications of Janus nanoparticles", "another_title", "another_title"]
example_string = example_string.casefold()
example_string = example_string.replace(" ", "")
for title in list_of_titles:
title = title.replace(" ", "")
title = title.casefold()
if unidecode(title) in unidecode(example_string):
print("Yes")
# Outputs nothing
I am trying to find if any substring in a list of substrings is in a given string. To do so, I loop over the items of the list and check if they exist in the string using python's in
operator. I am getting False values even though I am sure one of the substrings exists in the string. I have tried all the methods meant to unify the text and the substrings: replaced all " " with "", used casefold()
method, strip()
, even used unidecode
. Still, the substring is not found.
My code:
from unidecode import unidecode
example_string = '''available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/nanotoday
REVIEW
Synthesis, properties and applications of Janus
nanoparticles
Marco Lattuada a, T. Alan Hatton b,''' # as extracted from PDF file using fitz's `doc.load_page(0)` and then `.get_text()`
list_of_titles = ["Synthesis, properties and applications of Janus nanoparticles", "another_title", "another_title"]
example_string = example_string.casefold()
example_string = example_string.replace(" ", "")
for title in list_of_titles:
title = title.replace(" ", "")
title = title.casefold()
if unidecode(title) in unidecode(example_string):
print("Yes")
# Outputs nothing
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试
我认为
\ n
发生一些冲突Try with
I think the
\n
make some conflicts