给定一个单词列表 - 在 java 中完成单词的好的算法是什么? 权衡:速度/效率/内存占用
我正在探索潜在的免费/付费应用程序的硬件/软件要求(最终目标是移动 Java 应用程序)。
该应用程序将从这个简单的目标开始:给定数据库中相关单词的列表,能够对单个字符串输入进行单词补全。
换句话说,我已经知道数据库的内容 - 但算法的内存占用/速度/搜索效率将决定支持的数据量。
我一开始就从基于后缀的树搜索开始,但我想知道是否有人有过这种简单方法与会议中讨论的更复杂方法的速度/内存大小权衡的经验。
老实说,初始应用程序的上下文可能只有不到 500 个单词,因此这可能并不重要,但最终应用程序可能会扩展到数万或数十万条记录 - 因此存在速度与内存占用的问题。
我想我可以从简单的事情开始,然后再切换,但我希望早点理解权衡!
I'm exploring the hardware/software requirements (ultimate goal is mobile Java app) for a potential free/paid application.
The application will start with this simple goal: Given a list of relevant words in a database, to be able to do word completion on a single string input.
In other words I already know the contents of the database - but the memory footprint/speed/search efficiency of the algorithm will determine the amount of data supported.
I have been starting at the beginning with suffix-based tree searches, but am wondering if anyone has experience with the speed/memory size tradeoffs of this simple approach vs. the more complex ones being talked about in the conferences.
Honestly the initial application only has probably less than 500 words in context so it might not matter, but ultimately the application could expand to tens of thousands or hundreds of thousands of record - thus the question about speed vs. memory footprint.
I suppose I could start with something simple and switch over later, but I hope to understand the tradeoff earlier!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
单词完成表明您想要查找以给定前缀开头的所有单词。
Tries 对此很有用,如果您要添加或删除元素 - 其他节点,则特别有用不需要重新分配。
如果字典相当静态,并且检索很重要,请考虑一种更简单的数据结构:将单词放入有序向量中! 您可以执行 binary-search 来发现以正确前缀开头的候选者,并进行线性搜索它的每一面都可以发现所有其他候选人。
Word completion suggests that you want to find all the words that start with a given prefix.
Tries are good for this, and particularly good if you're adding or removing elements - other nodes do not need to be reallocated.
If the dictionary is fairly static, and retrieval is important, consider a far simpler data structure: put your words in an ordered vector! You can do binary-search to discover a candidate starting with the correct prefix, and a linear search each side of it to discover all other candidates.