python“ in”操作员没有在文本中找到子字符串

发布于 2025-02-04 17:32:55 字数 1001 浏览 2 评论 0原文

我试图发现子字符串列表中的任何子字符串是否在给定的字符串中。为此,我在列表的项目上循环,并使用Python的运算符中检查它们是否存在于字符串中。即使我确定其中一个子字符串存在于字符串中,我也会得到错误的值。我已经尝试了所有旨在统一文本和子字符串的方法:用“”替换所有“”,使用casefold()方法,strip(),甚至使用代码> unidecode 。尽管如此,找不到子字符串。

我的代码:

from unidecode import unidecode

example_string = '''available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/nanotoday
REVIEW
Synthesis, properties and applications of Janus
nanoparticles
Marco Lattuada a, T. Alan Hatton b,''' # as extracted from PDF file using fitz's `doc.load_page(0)` and then `.get_text()` 

list_of_titles = ["Synthesis, properties and applications of Janus nanoparticles", "another_title", "another_title"]

example_string = example_string.casefold()
example_string = example_string.replace(" ", "")

for title in list_of_titles:
    title = title.replace(" ", "")
    title = title.casefold()
    if unidecode(title) in unidecode(example_string):
         print("Yes")

# Outputs nothing

I am trying to find if any substring in a list of substrings is in a given string. To do so, I loop over the items of the list and check if they exist in the string using python's in operator. I am getting False values even though I am sure one of the substrings exists in the string. I have tried all the methods meant to unify the text and the substrings: replaced all " " with "", used casefold() method, strip(), even used unidecode. Still, the substring is not found.

My code:

from unidecode import unidecode

example_string = '''available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/nanotoday
REVIEW
Synthesis, properties and applications of Janus
nanoparticles
Marco Lattuada a, T. Alan Hatton b,''' # as extracted from PDF file using fitz's `doc.load_page(0)` and then `.get_text()` 

list_of_titles = ["Synthesis, properties and applications of Janus nanoparticles", "another_title", "another_title"]

example_string = example_string.casefold()
example_string = example_string.replace(" ", "")

for title in list_of_titles:
    title = title.replace(" ", "")
    title = title.casefold()
    if unidecode(title) in unidecode(example_string):
         print("Yes")

# Outputs nothing

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

雪花飘飘的天空 2025-02-11 17:32:56

尝试

example_string = example_string.replace("\n", " ")
example_string = example_string.casefold()

for title in list_of_titles:
    if title.casefold() in example_string: # here casefold() again!
         print("Yes")

我认为\ n发生一些冲突

Try with

example_string = example_string.replace("\n", " ")
example_string = example_string.casefold()

for title in list_of_titles:
    if title.casefold() in example_string: # here casefold() again!
         print("Yes")

I think the \n make some conflicts

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文