用于只读字符串列表（大约 100,000）的最有效的数据结构，具有快速前缀搜索

发布于 2024-07-26 14:44:29 字数 283 浏览 6 评论 0原文

我正在编写一个应用程序，需要从文件中读取字符串列表，将它们保存在数据结构中，然后通过前缀查找这些字符串。字符串列表只是给定语言中的单词列表。例如，如果搜索函数获取“stup”作为参数，则它应该返回[“stupid”，“stupidity”，“stupor”...]。它应该在 O(log(n)*m) 时间内完成，其中 n 是数据结构的大小，m 是结果的数量，并且应该尽可能快。内存消耗现在不是一个大问题。我是用 python 写的，所以如果你能向我指出一个用 python 包装器在 c 中实现的合适的数据结构（最好），那就太好了。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

救赎№ 2024-08-02 14:44:58

字符串数组。

然后通过它进行二分搜索以搜索第一个匹配项
然后一步一步地完成所有后续匹配

（我最初也在这里有链接列表...但是当然这没有随机访问，所以这是“bs”（这可能解释了为什么我被否决）。我的二进制文件搜索算法仍然是最快的方法

回复收藏 0 原文

紫﹏色ふ单纯 2024-08-02 14:44:54

trie（或前缀树）听起来很适合你。我相信它可以在 O(m) 中对长度为 m 的前缀字符串进行搜索。

回复收藏 0 原文

淑女气质 2024-08-02 14:44:50

tries 的一些 Python 实现：

回复收藏 0 原文

三岁铭 2024-08-02 14:44:46

你想要尝试一下。

http://en.wikipedia.org/wiki/Trie

我在 Scrabble 中使用过它们令人难以置信的程序。它们非常适合您描述的用例（快速前缀查找）。

下面是一些用 Python 构建 trie 的示例代码。这是我几个月前编写的一个 Boggle 程序。剩下的就留给读者作为练习。但是对于前缀检查，您基本上需要一个从根节点（变量 words）开始的方法，跟随前缀的字母到连续的子节点，如果找到这样的路径则返回 True，否则返回 False否则。

class Node(object):
    def __init__(self, letter='', final=False):
        self.letter = letter
        self.final = final
        self.children = {}
    def __contains__(self, letter):
        return letter in self.children
    def get(self, letter):
        return self.children[letter]
    def add(self, letters, n=-1, index=0):
        if n < 0: n = len(letters)
        if index >= n: return
        letter = letters[index]
        if letter in self.children:
            child = self.children[letter]
        else:
            child = Node(letter, index==n-1)
            self.children[letter] = child
        child.add(letters, n, index+1)

def load_dictionary(path):
    result = Node()
    for line in open(path, 'r'):
        word = line.strip().lower()
        result.add(word)
    return result

words = load_dictionary('dictionary.txt')

You want a trie.

http://en.wikipedia.org/wiki/Trie

I've used them in Scrabble and Boggle programs. They're perfect for the use case you described (fast prefix lookup).

Here's some sample code for building up a trie in Python. This is from a Boggle program I whipped together a few months ago. The rest is left as an exercise to the reader. But for prefix checking you basically need a method that starts at the root node (the variable words), follows the letters of the prefix to successive child nodes, and returns True if such a path is found and False otherwise.

class Node(object):
    def __init__(self, letter='', final=False):
        self.letter = letter
        self.final = final
        self.children = {}
    def __contains__(self, letter):
        return letter in self.children
    def get(self, letter):
        return self.children[letter]
    def add(self, letters, n=-1, index=0):
        if n < 0: n = len(letters)
        if index >= n: return
        letter = letters[index]
        if letter in self.children:
            child = self.children[letter]
        else:
            child = Node(letter, index==n-1)
            self.children[letter] = child
        child.add(letters, n, index+1)

def load_dictionary(path):
    result = Node()
    for line in open(path, 'r'):
        word = line.strip().lower()
        result.add(word)
    return result

words = load_dictionary('dictionary.txt')

回复收藏 0 原文

~没有更多了~