如何从字符数组中查找单词?
解决这个问题的最佳方法是什么:
我有一组数组,每个数组中有 3-4 个字符,如下所示:
{p, {a, {t, {m,
q, b, u, n,
r, c v o
s } } }
}
我还有一组字典单词。
查找字符数组是否可以组合形成字典单词之一的最佳/最快方法是什么?例如,上面的数组可以组成单词:
“pat”、“rat”、“at”、“to”、“bum”(笑)
但不是“nub”或“mat”
我应该循环遍历字典来查看是否有单词可以制作或获取字母的所有组合,然后将它们与字典进行比较
What is the best way to solve this:
I have a group of arrays with 3-4 characters inside each like so:
{p, {a, {t, {m,
q, b, u, n,
r, c v o
s } } }
}
I also have an array of dictionary words.
What is the best/fastest way to find if the array of characters can combine to form one of the dictionary words? For example, the above arrays could make the words:
"pat","rat","at","to","bum"(lol)
but not "nub" or "mat"
Should i loop through the dictionary to see if words can be made or get all the combinations from the letters then compare those to the dictionary
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我手头上有一些拼字游戏代码,所以我能够将它们组合在一起。我使用的字典是sowpods(267751个单词)。下面的代码将字典读取为文本文件,每行一个大写单词。
代码是 C#:
这是使用测试数据时的输出:
以及使用随机数据时的输出(不打印每个单词):
编辑: 我通过两个更改使其速度更快: 存储单词位于特里树的每个终端节点,这样就不必重建它。并将输入字母存储为哈希集数组而不是数组数组,以便 Contains() 调用速度更快。
I had some Scrabble code laying around, so I was able to throw this together. The dictionary I used is sowpods (267751 words). The code below reads the dictionary as a text file with one uppercase word on each line.
The code is C#:
Here is the output when using your test data:
And the output when using random data (does not print each word):
EDIT: I made it much faster with two changes: Storing the word at each terminal node of the trie, so that it doesn't have to be rebuilt. And storing the input letters as an array of hash sets instead of an array of arrays, so that the Contains() call is fast.
解决这个问题的方法可能有很多。
您感兴趣的是可用于组成单词的每个字符的数量,以及每个字典单词需要多少个字符。诀窍在于如何有效地在字典中查找这些信息。
也许您可以使用前缀树(trie)、某种智能哈希表或类似的。
不管怎样,你可能必须尝试所有的可能性,并对照字典进行检查。即,如果您有三个数组,每个数组包含三个值,则将有 3^3+3^2+3^1=39 个组合需要检查。如果这个过程太慢,那么也许你可以在字典前面添加一个 Bloom 过滤器,快速检查某个单词是否肯定不在字典中。
编辑: 不管怎样,这不是和 Scrabble 本质上一样吗?也许尝试谷歌搜索“拼字游戏算法”会给你一些很好的线索。
There are probably many way of solving this.
What you are interested in is the number of each character you have available to form a word, and how many of each character is required for each dictionary word. The trick is how to efficiently look up this information in the dictionary.
Perhaps you can use a prefix tree (a trie), some kind of smart hash table, or similar.
Anyway, you will probably have to try out all your possibilities and check them against the dictionary. I.e., if you have three arrays of three values each, there will be 3^3+3^2+3^1=39 combinations to check out. If this process is too slow, then perhaps you could stick a Bloom filter in front of the dictionary, to quickly check if a word is definitely not in the dictionary.
EDIT: Anyway, isn't this essentially the same as Scrabble? Perhaps try Googling for "scrabble algorithm" will give you some good clues.
只需通过生成和测试即可回答重新表述的问题。由于您有 4 个字母和 10 个数组,因此您只有大约 100 万种可能的组合(如果允许空白字符,则为 1000 万种)。您需要一种有效的方法来查找它们,使用 BDB 或某种基于磁盘的哈希。
之前发布的 trie 解决方案应该也可以工作,只是您在搜索的每一步中可以选择的字符更多地受到限制。它也应该更快。
The reformulated question can be answered just by generating and testing. Since you have 4 letters and 10 arrays, you've only got about 1 million possible combinations (10 million if you allow a blank character). You'll need an efficient way to look them up, use a BDB or some sort of disk based hash.
The trie solution previously posted should work as well, you are just restricted more by what characters you can choose at each step of the search. It should be faster as well.
我刚刚做了一个非常大的嵌套 for 循环,如下所示:
然后我对组合进行二分搜索,看看它是否在字典中,如果在,则将其添加到数组中
I just made a very large nested for loop like this:
Then I do a binary search on the combination to see if it is in the dictionary and add it to an array if it is