C++ 中的自动完成库
我需要一个 C++ 中的自动完成例程或库,用于 100 万个单词。我想我可以在网上找到像拉宾-卡普这样的例程。你知道有一个图书馆可以做到这一点吗?我在 Boost 中没有看到它。
另外,使用 MySql LIKE SQL 请求来做到这一点是一个疯狂的想法吗?
谢谢
编辑:确实,我需要的建议比自动完成更多(当用户输入前两个字母时建议十个单词)。我其实也有“尼康数码相机”的说法。但对于第一个版本,我只需要关于尼康“Ni”的建议,而不是“数码相机”的建议。
I need an auto-completion routine or library in C++ for 1 million words. I guess I can find a routine on the net like Rabin–Karp. Do you know a library that does this. I don't see it in Boost.
Also, is it a crazy idea to use MySql LIKE SQL request to do that ?
Thank you
EDIT: It is true that it is more suggestions than auto-completion that I need (propose ten words when the user typed the first 2 letters). I actually also have expressions "Nikon digital camera". But for a first version, I only need suggestions on "Ni" of Nikon and not on "digital camera".
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您从准备索引开始,则不必使用任何疯狂的算法。
一个简单的 Trie/二叉搜索树结构,保持单词按字母顺序排序,将允许有效的前缀搜索。
例如,在 C++ 中,
std::map
类具有lower_bound
成员,它将在 O(log N) 时间内指向可能扩展单词的第一个元素。You don't have to use any crazy algorithm if you begin by preparing an index.
A simple Trie/Binary Search Tree structure, that keeps the words ordered alphabetically, would allow efficient prefix searches.
In C++, for example, the
std::map
class has thelower_bound
member which would point in O(log N) to the first element that could possibly extend your word.嗯,如果您正在考虑使用 like,则很可能意味着您想要经典的自动完成功能(单词开头匹配)。
将您的数据(很好地)组织到 26 树(每个字母一个条目,或者如果您支持除字母之外的其他内容,则选择精心选择的 x 树)怎么样?这样,您只需组织一次数据,然后就可以通过树解析快速获得结果。如果您想限制自动完成中建议的结果数量,您可以调整树解析算法。看起来简单而高效(SQL 中类似的语法每次都必须比较表中的所有项目,而一旦正确设置数据,我的解决方案就会快得多)
其他解决方案,您可以查看 QCompleter (在代码上依赖 Qt 可能有点过分,我不知道)
hmmmm, if you're thinking about using like, it means that most probably, you want to have classical autocompletion (begin of word is matching).
What about organising (nicely) your data into a 26-tree (one entry per letter, or if you support other than letters, an well chosen x-tree). That way, you organize your data once and then, you have quick result by tree parsing. if you want to limit the amount of results proposed into your autocompletion, you can adapt your tree parsing algorithm. Seems simple and efficient (a like syntax in SQL will have to compare all your items in your table each time, whereas my solution is much quicker once the data is correctly set)
Other solution, you can peek at Qt implementation of QCompleter (might be overkill to depend on Qt on your code, I don't know)
我曾经参与过一个项目,使用 CLucene 做了类似的事情。效果很好。
I worked on a project once that did something like this using CLucene. It worked fine.
您可以使用 trie(前缀树)来存储单词。
然后您可以轻松地在子树上迭代获得前缀匹配。
You can use a trie (prefix tree) to store your words.
Then you can easily get prefix matches iterating on subtrees.
您可以使用 Damerau-Levenshtein 距离编写自己的简单自动完成函数。
You could write your own simple auto-completion function with using Damerau-Levenshtein distance.