如何检查另一个字符串中是否存在完全相同的字符串?

发布于 2024-09-28 06:20:08 字数 676 浏览 4 评论 0原文

我目前遇到了一些问题。我正在尝试编写一个程序,该程序将突出显示另一个字符串中出现的单词或短语,但前提是它所匹配的字符串完全相同。我遇到麻烦的部分是确定我与该短语匹配的子短语​​是否包含在另一个较大的子短语中。

显示此问题的一个简单示例:

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> indicators_in_phrase = [indicator for indicator in indicators 
                            if indicator in phrase.lower()]
>>> print indicators_in_phrase
['therefore', 'for']

我不希望“for”包含在该列表中。我知道为什么要包含它,但我想不出任何可以过滤掉这样的子字符串的表达式。

我注意到网站上还有其他类似的问题,但每个问题都涉及正则表达式解决方案,这是我目前还不太满意的问题,尤其是在 Python 中。有没有一种简单的方法可以在不使用正则表达式的情况下解决这个问题?如果没有,我们将非常感谢相应的正则表达式以及如何在上面的示例中实现它。

I'm currently running into a bit of a problem. I'm trying to write a program that will highlight occurrences of a word or phrase inside of another string, but only if the string it's being matched to is exactly the same. The part I'm running into troubles with is identifying whether or not the subphrase I'm matching the phrase with is contained within another larger subphrase.

A quick example which shows this problem:

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> indicators_in_phrase = [indicator for indicator in indicators 
                            if indicator in phrase.lower()]
>>> print indicators_in_phrase
['therefore', 'for']

I do not want 'for' included in that list. I know why it is being included, but I can't think of any expression that could filter out substrings like that.

I've noticed other similar questions on the site, but each involves a Regex solution, which is something I'm not feeling comfortable with yet, especially not in Python. Is there any kind-of-easy way to solve this problem without using a Regex expression? If not, the corresponding Regex expression and how it might be implemented in the above example would be very much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

亽野灬性zι浪 2024-10-05 06:20:08

方法可以在不使用正则表达式的情况下完成此操作,但大多数方法都非常复杂,您会希望自己花时间学习所需的简单正则表达式序列。

There are ways to do it without a regex, but most of those ways are so convoluted that you'll wish you had spent the time learning the simple regex sequence that you need for it.

两仪 2024-10-05 06:20:08

这是正则表达式的一行...

import re

indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."

indicators_in_phrase = set(re.findall(r'\b(%s)\b' % '|'.join(indicators), phrase.lower()))

It is one line with regex...

import re

indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."

indicators_in_phrase = set(re.findall(r'\b(%s)\b' % '|'.join(indicators), phrase.lower()))
骄傲 2024-10-05 06:20:08

正则表达式是最简单的方法!
提示:

re.compile(r'\btherefore\b')

然后你可以改变中间的单词!

编辑:我为你写了这个:

import re

indicators = ["therefore", "for", "since"]

phrase = "... therefore, I conclude I am awesome. "

def find(phrase, indicators):
    def _match(i):
        return re.compile(r'\b%s\b' % (i)).search(phrase)
    return [ind for ind in indicators if _match(ind)]

>>> find(phrase, indicators)
['therefore']

The regex are the simplest way!
Hint:

re.compile(r'\btherefore\b')

Then you can change the word in the middle!

EDIT: I wrote this for you:

import re

indicators = ["therefore", "for", "since"]

phrase = "... therefore, I conclude I am awesome. "

def find(phrase, indicators):
    def _match(i):
        return re.compile(r'\b%s\b' % (i)).search(phrase)
    return [ind for ind in indicators if _match(ind)]

>>> find(phrase, indicators)
['therefore']
少女的英雄梦 2024-10-05 06:20:08

我认为你想做的更像是这样的:

import string

words_in_phrase = string.split(phrase)

现在你将在这样的列表中包含单词:

['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']

然后像这样比较列表:

indicators_in_phrase = []
for word in words_in_phrase:
  if word in indicators:
    indicators_in_phrase.append(word)

可能有几种方法可以使其不那么冗长,但我更喜欢清晰。另外,您可能需要考虑删除“awesome”中的标点符号。和“因此”,

为此请使用 rstrip,如其他答案中所示

I think what you are trying to do is something more like this:

import string

words_in_phrase = string.split(phrase)

Now you'll have the words in a list like this:

['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']

Then compare the lists like so:

indicators_in_phrase = []
for word in words_in_phrase:
  if word in indicators:
    indicators_in_phrase.append(word)

There's probably several ways to make this less verbose, but I prefer clarity. Also, you might have to think about removing punctuation as in "awesome." and "therefore,"

For that use rstrip as in the other answer

﹉夏雨初晴づ 2024-10-05 06:20:08
  1. 创建一组指标
  2. 创建一组短语
  3. 查找交集

代码:

indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."
print list(set(indicators).intersection(set( [ each.strip('.,') for each in phrase.split(' ')])))

干杯:)

  1. Create set of indicators
  2. Create set of phrases
  3. Find intersection

Code:

indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."
print list(set(indicators).intersection(set( [ each.strip('.,') for each in phrase.split(' ')])))

Cheers:)

南烟 2024-10-05 06:20:08

有点长,但给出了一个想法/当然正则表达式可以让它变得简单

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> phrase_list = phrase.split()
>>> phrase_list
['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']
>>> phrase_list = [ k.rstrip(',') for k in phrase_list]
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in phrase_list]
>>> indicators_in_phrase 
['therefore']

A little lengthy but gives an idea / of course regex is there to make it simple

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> phrase_list = phrase.split()
>>> phrase_list
['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']
>>> phrase_list = [ k.rstrip(',') for k in phrase_list]
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in phrase_list]
>>> indicators_in_phrase 
['therefore']
时光无声 2024-10-05 06:20:08

“for”的问题是它在“therefore”里面还是它不是一个词?例如,如果您的指标之一是“awe”,您是否希望将其包含在 Indicators_in_phrase 中?

您希望如何处理以下情况?
指标 = ["abc", "cde"]
短语=“一abcde二”

Is the problem with "for" that it's inside "therefore" or that it's not a word? For example, if one of your indicators was "awe", would you want it to be included in indicators_in_phrase?

How would you want the following situation to be handled?
indicators = ["abc", "cde"]
phrase = "One abcde two"

ゝ杯具 2024-10-05 06:20:08

您可以从短语中去掉标点符号,然后对其进行拆分,以便所有单词都是单独的。然后你可以进行字符串比较

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
['therefore', 'I', 'conclude', 'I', 'am', 'awesome']
>>> p = ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in p ]
>>> indicators_in_phrase
['therefore']

You can strip off punctuations from your phrase, then do split on it so that all words are individual. Then you can do your string comparison

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
['therefore', 'I', 'conclude', 'I', 'am', 'awesome']
>>> p = ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in p ]
>>> indicators_in_phrase
['therefore']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文