要构建此数据结构，您需要按如下方式预处理原始单词/频率列表。对于每个单词，确定与其字母对应的一系列数字。例如，单词“monsoon”将为 6667666，而单词“apple”将为 27753。然后，将该序列插入到数字特里树中，根据需要创建新节点。当到达与该数字串对应的最后一个节点时，将该单词插入到该节点对应的优先级队列中。这个操作的总时间实际上相当不错；给定一个总共有 n 个字母的单词列表，这可以在 O(n T_insert) 时间内完成，其中 T_insert 是插入所需的时间一个词进入优先队列。这是因为我们最多需要访问每个字母三次 - 一次将其转换为数字，一次跟踪数字树中的相应边缘，一次将其插入优先级队列。

这种方法还可以轻松地将新单词插入预测器中。每当用户离开数字特里树或选择一个不是第一个单词的单词时，您可以通过在特里树中创建适当的一系列节点来将该单词插入到数字特里树中，然后将该单词插入到正确的优先级中队列。

如果您使用存储指向其最小元素的指针的平衡二叉搜索树制作优先级队列，则可以在 O(n) 时间内找到一系列 n 数字的最快单词，然后列出所有换句话说，每个带有该前缀的单词均摊为 O(1) 时间。您还可以通过这种方式轻松插入或删除新单词。

由于单词存储在 trie 中，因此只需从当前 trie 节点向下走到当前编号给出的子节点，即可在 O(1) 内更新对当前单词的猜测。当用户按下空格键时，您只需重置回数字特里树的根。最后，当用户点击星号时，您可以使用上述技巧来查看给定 trie 节点的下一个最流行的单词。

希望这有帮助！

One option that I think would be particularly efficient would be to preprocess the trie into a modified structure specifically optimized for predicting words based on keystrokes.

Intuitively, the new structure is a trie created out of the possible digits that could be pressed at any point. Each trie node then stores a priority queue of the words that could potentially be spelled out using those digits. You could then predict which words to use via the following algorithm:

Start at the root of the trie.
For each digit, follow the pointer corresponding to the digit.
If you walk off the trie, then there aren't any suggestions to make.
Otherwise, look at the priority queue for the words that can be formed from precisely these digits, then suggest the element in that priority queue with the highest use count.

This algorithm takes time O(n + T_max), where n is the length of the digit string and T_max is the time required to find the most popular word for the given prefix. This could be O(1) for something like a binary heap, but could be slower for more complex heaps.

To build this data structure, you would preprocess the original list of words/frequencies as follows. For each word, determine the series of digits corresponding to its letters. For example, the word "monsoon" would be 6667666, while the word "apple" would be 27753. Then, insert that sequence into the digit trie, creating new nodes as necessary. When you arrive at the final node corresponding to this digit string, insert the word into the corresponding priority queue for that node. The total time for this operation is actually quite good; given a list of words with a total of n letters in it, this can be done in O(n T_insert) time, where T_insert is the time required to insert a word into the priority queue. This is because we need to visit each letter at most three times - once to convert it to a digit, once to follow the appropriate edge in the digit trie, and once to insert it into a priority queue.

This approach also makes it easy to insert new words into the predictor quite easily. Whenever the user walks off the digit trie or chooses a word that isn't the number one word, you could just insert that word into the digit trie by creating the appropriate series of nodes in the trie, then inserting the word into the correct priority queue.

If you make your priority queues out of a balanced binary search tree that stores a pointer to its smallest element, you can implement finding the fastest word for a series of n digits in O(n) time, and can then list off all of the other words with that prefix in amortized O(1) time apiece. You could also insert or delete new words very easily this way.

Because the words are stored in a trie, you can in O(1) update your guess of the current word by just walking from the current trie node down into the child node given by the current number. When the user hits the space key, you would just reset back up to the root of the digit trie. Finally, when the user hits star, you could use the above trick to just look at the next most-popular word for the given trie node.

Hope this helps!

回复收藏 0 原文

~没有更多了~