Python:列表/集合的交集

发布于 2024-09-19 01:55:29 字数 1170 浏览 7 评论 0 原文

<代码> def boolean_search_and(self, text):

    results = []
    and_tokens = self.tokenize(text)
    tokencount = len(and_tokens)

    term1 = and_tokens[0]
    print ' term 1:', term1

    term2 = and_tokens[1]
    print ' term 2:', term2

    #for term in and_tokens:
    if term1 in self._inverted_index.keys():
        resultlist1 = self._inverted_index[term1]
        print resultlist1
    if term2 in self._inverted_index.keys():
        resultlist2 = self._inverted_index[term2]
        print resultlist2
    #intersection of two sets casted into a list                
    results = list(set(resultlist1) & set(resultlist2)) 
    print 'results:', results

    return str(results)

此代码非常适合两个标记,例如:text= "Hello World" 等,tokens = ['hello', 'world']。我想将其概括为多个标记,因此文本可以是一个句子,也可以是整个文本文件。
self._inverted_index 是一个字典,它将标记保存为键,值是键/标记出现的 DocID。

你好-> [1,2,5,6]
世界-> [1,3,5,7,8]
结果:
你好和世界-> [1,5]

我想要实现以下目标: 说, (((hello AND Computer) AND science) AND world)

我正在努力使其适用于多个单词,而不仅仅是两个单词。我今天早上开始使用 python 工作,所以我不知道它必须提供的很多功能。

有什么想法吗?


def boolean_search_and(self, text):

    results = []
    and_tokens = self.tokenize(text)
    tokencount = len(and_tokens)

    term1 = and_tokens[0]
    print ' term 1:', term1

    term2 = and_tokens[1]
    print ' term 2:', term2

    #for term in and_tokens:
    if term1 in self._inverted_index.keys():
        resultlist1 = self._inverted_index[term1]
        print resultlist1
    if term2 in self._inverted_index.keys():
        resultlist2 = self._inverted_index[term2]
        print resultlist2
    #intersection of two sets casted into a list                
    results = list(set(resultlist1) & set(resultlist2)) 
    print 'results:', results

    return str(results)

This code works great for two tokens, ex: text= "Hello World" and so, tokens = ['hello', 'world']. I want to generalize it for multiple tokens, so the text can be a sentence, or an entire text file.

self._inverted_index is a dictionary that saves the tokens as keys and the values are the DocIDs in which the keys/tokens occur.

hello -> [1,2,5,6]
world -> [1,3,5,7,8]
result:
hello AND world -> [1,5]

I want to achieve result for:
say,
(((hello AND computer) AND science) AND world)

I am working on making this work for multiple words, not just two. I started working in python this mornin', so I'm unaware of a lot of features it has to offer.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

时光磨忆 2024-09-26 01:55:29

我想将其推广到多个
代币

def boolean_search_and_multi(self, text):
    and_tokens = self.tokenize(text)
    results = set(self._inverted_index[and_tokens[0]])
    for tok in and_tokens[1:]:
        results.intersection_update(self._inverted_index[tok])
    return list(results)

I want to generalize it for multiple
tokens

def boolean_search_and_multi(self, text):
    and_tokens = self.tokenize(text)
    results = set(self._inverted_index[and_tokens[0]])
    for tok in and_tokens[1:]:
        results.intersection_update(self._inverted_index[tok])
    return list(results)
家住魔仙堡 2024-09-26 01:55:29

内置的 set 类型适合您吗?

$ python
Python 2.6.5 (r265:79063, Jun 12 2010, 17:07:01)
[GCC 4.3.4 20090804 (release) 1] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> hello = set([1,2,5,6])
>>> world = set([1,3,5,7,8])
>>> hello & world
set([1, 5])

Would the built-in set type work for you?

$ python
Python 2.6.5 (r265:79063, Jun 12 2010, 17:07:01)
[GCC 4.3.4 20090804 (release) 1] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> hello = set([1,2,5,6])
>>> world = set([1,3,5,7,8])
>>> hello & world
set([1, 5])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文