比较字符串,找到每个字符串中存在的部分

发布于 2025-02-11 06:17:27 字数 234 浏览 2 评论 0 原文

如何比较几行,并找到每行中存在的单词的单词/组合?使用纯Python,NLTK或其他任何东西。

few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
# some magic
result = 'foo bar'

How do I compare several rows and find words/combination of words that are present in each row? Using pure python, nltk or anything else.

few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
# some magic
result = 'foo bar'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

我只土不豪 2025-02-18 06:17:27

将每个字符串分开在白空间,然后将结果单词保存到集合中。然后,计算三组的交点:

few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
sets = [set(s.split()) for s in few_strings]
common_words = sets[0].intersection(*sets[1:])
print(common_words)

输出:

{'bar', 'foo'}

Split each string at whitespaces and save the resulting words into sets. Then, compute the intersection of the three sets:

few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
sets = [set(s.split()) for s in few_strings]
common_words = sets[0].intersection(*sets[1:])
print(common_words)

Output:

{'bar', 'foo'}
拒绝两难 2025-02-18 06:17:27

您可能需要使用标准库 fifflib 进行序列比较,包括查找常见子字符串:

from difflib import SequenceMatcher

list_of_str = ['this is foo bar', 'this is not a foo bar', 'some other foo bar here']

result = list_of_str[0]
for next_string in list_of_str:
    match = SequenceMatcher(None, result, next_string).find_longest_match()
    result = result[match.a:match.a + match.size]

# result be 'foo bar'
from difflib import SequenceMatcher

string1 = "apple pie available"
string2 = "come have some apple pies"

match = SequenceMatcher(None, string1, string2).find_longest_match()

print(match)  # -> Match(a=0, b=15, size=9)
print(string1[match.a:match.a + match.size])  # -> apple pie
print(string2[match.b:match.b + match.size])  # -> apple pie

You might want to use the standard library difflib for sequence comparisons including finding common substrings:

from difflib import SequenceMatcher

list_of_str = ['this is foo bar', 'this is not a foo bar', 'some other foo bar here']

result = list_of_str[0]
for next_string in list_of_str:
    match = SequenceMatcher(None, result, next_string).find_longest_match()
    result = result[match.a:match.a + match.size]

# result be 'foo bar'
from difflib import SequenceMatcher

string1 = "apple pie available"
string2 = "come have some apple pies"

match = SequenceMatcher(None, string1, string2).find_longest_match()

print(match)  # -> Match(a=0, b=15, size=9)
print(string1[match.a:match.a + match.size])  # -> apple pie
print(string2[match.b:match.b + match.size])  # -> apple pie
何以畏孤独 2025-02-18 06:17:27
few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
  1. 句子划分的每个句子(“” )创建一组
  2. 单词
  3. 为每个 一个句子
# 1.
sets = [set(s.split(" ")) for s in few_strings]
# 2.
result = sets[0]
# 3.
for i in range(len(sets)):
    result = result.intersection(sets[i])

现在,您有一个python set 单词的单词>,这在所有句子中发生。
您可以将集合转换为列表:

result = list(result)

或与

result = " ".join(result)
few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
  1. Create sets of words for each sentence splitting by space (" ")
  2. Add the first string to results
  3. Loop over the sentences and update result variable with the interesction of the current result and one sentence
# 1.
sets = [set(s.split(" ")) for s in few_strings]
# 2.
result = sets[0]
# 3.
for i in range(len(sets)):
    result = result.intersection(sets[i])

Now you have a Python Set of words which occured in all sentences.
You can convert the set to list with:

result = list(result)

or to string with

result = " ".join(result)
零崎曲识 2025-02-18 06:17:27

您也可以在不使用库的情况下做到这一点

few_strings = ('this is foo bar', 'some other foo bar here', 'this is not a foo bar')
strings = [s.split() for s in few_strings]
strings.sort(key=len)
print(strings)
result = ''

for word in strings[0]:
    count = 0
    for string in strings:
        if word not in string:
            break
        else:
            count += 1
    if count == len(strings):
        result += word + ' '

print(result)

You can do it without using libraries too

few_strings = ('this is foo bar', 'some other foo bar here', 'this is not a foo bar')
strings = [s.split() for s in few_strings]
strings.sort(key=len)
print(strings)
result = ''

for word in strings[0]:
    count = 0
    for string in strings:
        if word not in string:
            break
        else:
            count += 1
    if count == len(strings):
        result += word + ' '

print(result)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文