生成给定字符串的所有可能的字母组合(最多 2 个字母)的算法
生成给定字符串的所有可能字母组合(最多 2 个字母)的算法
尝试在 AS3 中创建 Anagram 求解器,例如此处找到的这个:
http://homepage.ntlworld.com/adam.bozon/anagramsolver.htm
我在为各种长度生成所有可能的字母组合时遇到问题字符串。如果我只是生成固定长度的排列,这对我来说不会是一个问题......但我希望减少字符串的长度并从原始字母集中获取所有可能的排列最大长度小于原始字符串的字符串。例如,假设我想要一个长度为 2 的字符串,但我有一个 3 个字母的字符串“abc”,则输出将为:ab ac ba bc ca cb。
理想情况下,该算法会生成一个完整的可能组合列表,从原始字符串长度开始,一直到最小字符串长度 2。我有一种感觉,可能有一个小型递归算法可以做到这一点,但我无法理解它。我在 AS3 中工作。
谢谢!
Algorithm to generate all possible letter combinations of given string down to 2 letters
Trying to create an Anagram solver in AS3, such as this one found here:
http://homepage.ntlworld.com/adam.bozon/anagramsolver.htm
I'm having a problem wrapping my brain around generating all possible letter combinations for the various lengths of strings. If I was only generating permutations for a fixed length, it wouldn't be such a problem for me... but I'm looking to reduce the length of the string and obtain all the possible permutations from the original set of letters for a string with a max length smaller than the original string. For example, say I want a string length of 2, yet I have a 3 letter string of “abc”, the output would be: ab ac ba bc ca cb.
Ideally the algorithm would produce a complete list of possible combinations starting with the original string length, down to the smallest string length of 2. I have a feeling there is probably a small recursive algorithm to do this, but can't wrap my brain around it. I'm working in AS3.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
为了编写您链接的类型的字谜解算器,您所请求的算法不是必需的。它也非常昂贵。
例如,让我们看一下像
MONKEY
这样的 6 个字母的单词。该单词的所有 6 个字母都不同,因此您将创建:现在,大概您不想吐出所有 1950 个单词(例如“OEYKMN”)作为字谜(确实如此,但其中大多数也是胡言乱语)。我猜你有一本合法英语单词的字典,你只想检查这些单词是否是查询单词的字谜词,可以选择不使用所有字母。
如果真是这样,那么问题就简单了。
要确定 2 个单词是否是彼此的字谜词,您需要做的就是计算每个字母的使用次数,然后比较这些数字!
让我们将自己限制为仅 26 个字母 AZ,不区分大小写。您需要做的是编写一个函数
countLetters
,它接受一个单词并返回一个包含 26 个数字的数组。数组中的第一个数字对应于单词中字母A
的计数,第二个数字对应于B
的计数,依此类推。然后,两个单词
W1
和W2
是精确的字谜,如果对于每个i
都有countLetters(W1)[i] == countLetters(W2)[i]
!也就是说,每个单词使用每个字母的次数完全相同!对于我所说的子字谜(
MONEY
是MONKEY
的子字谜),W1
是的子字谜>W2
如果对于每个i
都有countLetters(W1)[i] <= countLetters(W2)[i]
!也就是说,子字谜可以使用较少的某些字母,但不能使用更多!(注意:
MONKEY
也是MONKEY
的子字谜)。这应该为您提供一个足够快的算法,其中给定查询字符串,您所需要做的就是通读字典一次,将每个单词的字母计数数组与查询单词的字母计数数组进行比较。您可以做一些小的优化,但这应该足够好了。
或者,如果您想要最佳性能,您可以预处理字典(预先知道)并创建子字谜关系的有向非循环图。
下面是这样一个图的一部分用于说明:
基本上每个节点都是具有相同字母计数数组的所有单词的存储桶(即它们是精确的字谜词)。如果
N2
的数组是<=
(如上面定义的),则有一个从N1
到N2
的节点N1
的数组(您可以执行传递归约以存储最少量的边)。然后,要列出一个单词的所有子字谜,您所要做的就是找到与其字母计数数组对应的节点,并递归地探索从该节点可到达的所有节点。他们所有的桶都将包含子字谜。
For the purpose of writing an anagram solver the kind of which you linked, the algorithm that you are requesting is not necessary. It is also VERY expensive.
Let's look at a 6-letter word like
MONKEY
, for example. All 6 letters of the word are different, so you would create:Now, presumably you're not trying to spit out all 1950 words (e.g. 'OEYKMN') as anagrams (which they are, but most of them are also gibberish). I'm guessing you have a dictionary of legal English words, and you just want to check if any of those words are anagrams of the query word, with the option of not using all letters.
If that is the case, then the problem is simple.
To determine if 2 words are anagrams of each other, all you need to do is count how many times each letters are used, and compare these numbers!
Let's restrict ourself to only 26 letters A-Z, case insensitive. What you need to do is write a function
countLetters
that takes a word and returns an array of 26 numbers. The first number in the array corresponds to the count of the letterA
in the word, second number corresponds to count ofB
, etc.Then, two words
W1
andW2
are exact anagram ifcountLetters(W1)[i] == countLetters(W2)[i]
for everyi
! That is, each word uses each letter the exact same number of times!For what I'd call sub-anagrams (
MONEY
is a sub-anagram ofMONKEY
),W1
is a sub-anagram ofW2
ifcountLetters(W1)[i] <= countLetters(W2)[i]
for everyi
! That is, the sub-anagram may use less of certain letters, but not more!(note:
MONKEY
is also a sub-anagram ofMONKEY
).This should give you a fast enough algorithm, where given a query string, all you need to do is read through the dictionary once, comparing the letter count array of each word against the letter count array of the query word. You can do some minor optimizations, but this should be good enough.
Alternatively, if you want utmost performance, you can preprocess the dictionary (which is known in advance) and create a directed acyclic graph of sub-anagram relationship.
Here's a portion of such a graph for illustration:
Basically each node is a bucket for all words that have the same letter count array (i.e. they're exact anagrams). Then there's a node from
N1
toN2
ifN2
's array is<=
(as defined above)N1
's array (you can perform transitive reduction to store the least amount of edges).Then to list all sub-anagrams of a word, all you have to do is find the node corresponding to its letter count array, and recursively explore all nodes reachable from that node. All their buckets would contain the sub-anagrams.
下面的js代码将找到n个字母的单词中所有可能的“单词”。当然,这并不意味着它们是真实的单词,但确实为您提供了所有组合。在我的机器上,一个 7 个字母的单词大约需要 0.4 秒,一个 9 个字母的单词需要 15 秒(如果没有重复的字母,多达近一百万种可能性)。然而,这些时间包括查字典并查找哪些是真实的单词。
完整代码位于
用于在单词中查找单词的代码
The following js code will find all possible "words" in an n letter word. Of course this doesn't mean that they are real words but does give you all the combinations. On my machine it takes about 0.4 seconds for a 7 letter word and 15 secs for a 9 letter word (up to almost a million possibilities if no repeated letters). However those times include looking in a dictionary and finding which are real words.
Complete code at
Code for finding words in words
你想要某种安排。如果您熟悉排列算法,那么您就会知道您需要检查以查看何时生成了足够的数字。只需更改该限制即可:
我不知道 AS3,但这里有一个伪代码:
for "abc" and Arrangements(3, 2, 1);这将打印:
如果您想要先有三个,然后有两个,请考虑以下内容:
然后对于“abc”,调用
Arrangements(3, 3, 1);
,然后Arrangements( 3, 2, 1);
You want a sort of arrangements. If you're familiar with the permutation algorithm then you know you have a check to see when you've generated enough numbers. Just change that limit:
I don't know AS3, but here's a pseudocode:
for "abc" and Arrangements(3, 2, 1); this will print:
If you want those with three first, and then those with two, consider this:
Then for "abc" call
Arrangements(3, 3, 1);
and thenArrangements(3, 2, 1);
您可以通过查找完整字母图中的所有路径来生成字母表中的所有单词。您可以通过从每个字母进行深度优先搜索并返回每个点的当前路径来找到该图中的所有路径。
You can generate all words in an alphabet by finding all paths in a complete graph of the letters. You can find all paths in that graph by doing a depth first search from each letter and returning the current path at each point.
有简单的 O(N),其中 n 是词汇量的大小。
只需对词汇表中每个单词中的字母进行排序或更好,创建它们的二进制掩码,然后比较您拥有的字母。
There is simple O(N) where n is size of vocabulary.
Just sort letters in each word in vocabulary or better, create binary mask of them and then compare whit letters you have.