T9类型字典背后的数据结构
T9 词典如何工作?其背后的数据结构是怎样的。如果我们输入“4663”,当我们按下按钮时,我们会得到“gone”,然后是“home”等...
编辑:如果用户输入46,那么它应该显示“go”,当按下向下箭头时应该显示“go”显示“消失”等...
How does a T9 dictionary work? What is the data structure behind it. If we type '4663' we get 'good' when we press down button we get 'gone' then 'home' etc...
EDIT: If the user types in 46 then it should show 'go' and when pressed down arrow should show 'gone' etc...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
它可以通过多种方式实现,其中之一是 Trie。路径由数字表示,节点指向单词的集合。
它也可以使用嵌套哈希表来实现,哈希表的键是一个字母和对于每个数字,算法都会计算所有可能的路线(O(3^n) 路线)。
It can be implemented in several ways, one of them is Trie. The route is represented by the digits and the nodes point to collection of words.
It can be implemented using nested hash tables as well, the key of the hash table is a letter and on every digit the algorithm calculates all possible routes (O(3^n) routes).
翻译为
T9 的工作原理是从第一个可能的字母开始按顺序过滤可能性。因此,示例中的第一步是将字典列表过滤为以 G、H 或 I 开头的所有单词。下一步,获取该列表并按 M、N、O 过滤第二个字母。依此类推...
translates to
T9 works by filtering the possibilities down sequentially starting with the first possible letters. So the first step in your example will be to filter the dictionary list to all words beginning with G, H, or I. Next step, take that list and filter the second letters by M, N, O. And so on...
我猜想,就像之前的 T9 使用 trie 一样,其中链接由位图表示(每个字母 1 位)。这被称为简洁的数据结构,正如 Steve Hanov 很好地解释的那样:
http://stevehanov。 ca/blog/index.php?id=120
I guess, as those before that T9 uses a trie, where the links are represented by a bitmap (1 bit per letter). This is called a succinct data structure, as Steve Hanov explains very nicely:
http://stevehanov.ca/blog/index.php?id=120
我认为 T9 使用的是基于位图的 Trie。第一层有一个 32 位(我假设它们扩展为 32)字,其中每个位代表一个字母。所有有效作为单词起始字母的字母都已设置其位。
在下一个级别中,对于上一个级别中设置的每个位都有一个 32 位映射。在这些映射中,如果相应的字母作为单词中的第二个字母有效,则设置每个位,而起始字母由第一级决定。
然后该结构继续。使用每个字母一位可以实现非常高效的存储。然而,寻址方案必定具有挑战性。还可能存在对短单词使用上述模式而对较长单词使用其他模式的某种优化。
I think that T9 is using a bit-map-based Trie. On the first level there is a 32-bit (I assume they expanded to 32) word where each bit represents a letter. All letters that are valid as start letters for a word have their bit set.
In the next level there is one 32-bit map for each of those bits that were set in the previous level. In these maps each bit is set if the corresponding letter is valid as a second letter in a word, with the starting letter decided by the first level.
The structure then continues. Using one-bit-per-letter makes a very efficient storage. The addressing scheme however must be challenging. There might also be some kind of optimization using the above schema for short words while using something else for longer words.
使用 Trie 的 C# 实现
C# implementation using Trie
我可能会从一本字典开始,然后(对于每个单词)将单词推到一个列表中,该列表由可以形成该单词的数字键入。
然后,您可能需要以某种方式对结果列表进行排序,以便更可能的单词出现在不太可能的单词之前(如果您有空间,我还会为计数器添加一个小区域,这样我们就可以逐步重新排序)对这些进行排序或简单地将最后使用的内容移至建议列表的前面,因此随着时间的推移,我们倾向于为用户提供更好的答案)。
这样,当您有 4663 作为输入时,您可以通过大约 O(1) 的哈希表查找来非常简单地检索相关链表。
I'd probably do it by starting with a dictionary, then (for each word) push the word onto a list keyed by the numbers that could form the word.
Then, you'll probably need to sort the resulting lists in some fashion, so more likely words appear before less likely words (if you have the space, I'd also include a small area for a counter, so we can incrementally re-sort these OR simply mov ethe last used to the front of the suggestion list, so we, over time, tend towards giving the user a better answer).
That way, when you have 4663 as an input, you can quite simply retrieve the relevant linked list wit ha roughly O(1) hash table lookup.