生成给定字符串的所有可能的字母组合(最多 2 个字母)的算法

发布于 2024-08-25 09:29:09 字数 530 浏览 9 评论 0原文

生成给定字符串的所有可能字母组合(最多 2 个字母)的算法

尝试在 AS3 中创建 Anagram 求解器,例如此处找到的这个:

http://homepage.ntlworld.com/adam.bozon/anagramsolver.htm

我在为各种长度生成所有可能的字母组合时遇到问题字符串。如果我只是生成固定长度的排列,这对我来说不会是一个问题......但我希望减少字符串的长度并从原始字母集中获取所有可能的排列最大长度小于原始字符串的字符串。例如,假设我想要一个长度为 2 的字符串,但我有一个 3 个字母的字符串“abc”,则输出将为:ab ac ba bc ca cb。

理想情况下,该算法会生成一个完整的可能组合列表,从原始字符串长度开始,一直到最小字符串长度 2。我有一种感觉,可能有一个小型递归算法可以做到这一点,但我无法理解它。我在 AS3 中工作。

谢谢!

Algorithm to generate all possible letter combinations of given string down to 2 letters

Trying to create an Anagram solver in AS3, such as this one found here:

http://homepage.ntlworld.com/adam.bozon/anagramsolver.htm

I'm having a problem wrapping my brain around generating all possible letter combinations for the various lengths of strings. If I was only generating permutations for a fixed length, it wouldn't be such a problem for me... but I'm looking to reduce the length of the string and obtain all the possible permutations from the original set of letters for a string with a max length smaller than the original string. For example, say I want a string length of 2, yet I have a 3 letter string of “abc”, the output would be: ab ac ba bc ca cb.

Ideally the algorithm would produce a complete list of possible combinations starting with the original string length, down to the smallest string length of 2. I have a feeling there is probably a small recursive algorithm to do this, but can't wrap my brain around it. I'm working in AS3.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

打小就很酷 2024-09-01 09:29:09

为了编写您链接的类型的字谜解算器,您所请求的算法不是必需的。它也非常昂贵。

例如,让我们看一下像 MONKEY 这样的 6 个字母的单词。该单词的所有 6 个字母都不同,因此您将创建:

  • 6*5*4*3*2*1 不同的 6 字母单词
  • 6*5*4*3*2 不同的 5 字母单词
  • 6*5*4* 3 个不同的 4 字母单词
  • 6*5*4 不同的 3 字母单词
  • 6*5 不同的 2 字母单词
  • 总共 1950 个单词

现在,大概您不想吐出所有 1950 个单词(例如“OEYKMN”)作为字谜(确实如此,但其中大多数也是胡言乱语)。我猜你有一本合法英语单词的字典,你只想检查这些单词是否是查询单词的字谜词,可以选择不使用所有字母。

如果真是这样,那么问题就简单了。

要确定 2 个单词是否是彼此的字谜词,您需要做的就是计算每个字母的使用次数,然后比较这些数字!

让我们将自己限制为仅 26 个字母 AZ,不区分大小写。您需要做的是编写一个函数 countLetters,它接受一个单词并返回一个包含 26 个数字的数组。数组中的第一个数字对应于单词中字母 A 的计数,第二个数字对应于 B 的计数,依此类推。

然后,两个单词 W1W2 是精确的字谜,如果对于每个 i 都有 countLetters(W1)[i] == countLetters(W2)[i] !也就是说,每个单词使用每个字母的次数完全相同!

对于我所说的子字谜(MONEYMONKEY 的子字谜),W1的子字谜>W2 如果对于每个 i 都有 countLetters(W1)[i] <= countLetters(W2)[i]!也就是说,子字谜可以使用较少的某些字母,但不能使用更多!

(注意:MONKEY 也是 MONKEY 的子字谜)。


这应该为您提供一个足够快的算法,其中给定查询字符串,您所需要做的就是通读字典一次,将每个单词的字母计数数组与查询单词的字母计数数组进行比较。您可以做一些小的优化,但这应该足够好了。

或者,如果您想要最佳性能,您可以预处理字典(预先知道)并创建子字谜关系的有向非循环图。

下面是这样一个图的一部分用于说明:

 D=1,G=1,O=1  ----------> D=1,O=1
  {dog,god}   \            {do,od}
               \
                \-------> G=1,O=1
                           {go}

基本上每个节点都是具有相同字母计数数组的所有单词的存储桶(即它们是精确的字谜词)。如果 N2 的数组是 <= (如上面定义的),则有一个从 N1N2 的节点N1 的数组(您可以执行传递归约以存储最少量的边)。

然后,要列出一个单词的所有子字谜,您所要做的就是找到与其字母计数数组对应的节点,并递归地探索从该节点可到达的所有节点。他们所有的桶都将包含子字谜。

For the purpose of writing an anagram solver the kind of which you linked, the algorithm that you are requesting is not necessary. It is also VERY expensive.

Let's look at a 6-letter word like MONKEY, for example. All 6 letters of the word are different, so you would create:

  • 6*5*4*3*2*1 different 6-letter words
  • 6*5*4*3*2 different 5-letter words
  • 6*5*4*3 different 4-letter words
  • 6*5*4 different 3-letter words
  • 6*5 different 2-letter words
  • For a total of 1950 words

Now, presumably you're not trying to spit out all 1950 words (e.g. 'OEYKMN') as anagrams (which they are, but most of them are also gibberish). I'm guessing you have a dictionary of legal English words, and you just want to check if any of those words are anagrams of the query word, with the option of not using all letters.

If that is the case, then the problem is simple.

To determine if 2 words are anagrams of each other, all you need to do is count how many times each letters are used, and compare these numbers!

Let's restrict ourself to only 26 letters A-Z, case insensitive. What you need to do is write a function countLetters that takes a word and returns an array of 26 numbers. The first number in the array corresponds to the count of the letter A in the word, second number corresponds to count of B, etc.

Then, two words W1 and W2 are exact anagram if countLetters(W1)[i] == countLetters(W2)[i] for every i! That is, each word uses each letter the exact same number of times!

For what I'd call sub-anagrams (MONEY is a sub-anagram of MONKEY), W1 is a sub-anagram of W2 if countLetters(W1)[i] <= countLetters(W2)[i] for every i! That is, the sub-anagram may use less of certain letters, but not more!

(note: MONKEY is also a sub-anagram of MONKEY).


This should give you a fast enough algorithm, where given a query string, all you need to do is read through the dictionary once, comparing the letter count array of each word against the letter count array of the query word. You can do some minor optimizations, but this should be good enough.

Alternatively, if you want utmost performance, you can preprocess the dictionary (which is known in advance) and create a directed acyclic graph of sub-anagram relationship.

Here's a portion of such a graph for illustration:

 D=1,G=1,O=1  ----------> D=1,O=1
  {dog,god}   \            {do,od}
               \
                \-------> G=1,O=1
                           {go}

Basically each node is a bucket for all words that have the same letter count array (i.e. they're exact anagrams). Then there's a node from N1 to N2 if N2's array is <= (as defined above) N1's array (you can perform transitive reduction to store the least amount of edges).

Then to list all sub-anagrams of a word, all you have to do is find the node corresponding to its letter count array, and recursively explore all nodes reachable from that node. All their buckets would contain the sub-anagrams.

倾听心声的旋律 2024-09-01 09:29:09

下面的js代码将找到n个字母的单词中所有可能的“单词”。当然,这并不意味着它们是真实的单词,但确实为您提供了所有组合。在我的机器上,一个 7 个字母的单词大约需要 0.4 秒,一个 9 个字母的单词需要 15 秒(如果没有重复的字母,多达近一百万种可能性)。然而,这些时间包括查字典并查找哪些是真实的单词。

var getWordsNew=function(masterword){
var result={}
 var a,i,l;
function nextLetter(a,l,key,used){
     var i;
    var j;
    if(key.length==l){
        return;
    }
    for(i=0;i<l;i++){
        if(used.indexOf(""+i)<0){
            result[key+a[i]]="";
            nextLetter(a,l,key+a[i],used+i);
        }
    }
 }
a=masterword.split("");
  l=a.length;
for (i = 0; i < a.length; i++) {
    result[a[i]] = "";
    nextLetter(a, l, a[i], "" + i)
}
return result;
}

完整代码位于

用于在单词中查找单词的代码

The following js code will find all possible "words" in an n letter word. Of course this doesn't mean that they are real words but does give you all the combinations. On my machine it takes about 0.4 seconds for a 7 letter word and 15 secs for a 9 letter word (up to almost a million possibilities if no repeated letters). However those times include looking in a dictionary and finding which are real words.

var getWordsNew=function(masterword){
var result={}
 var a,i,l;
function nextLetter(a,l,key,used){
     var i;
    var j;
    if(key.length==l){
        return;
    }
    for(i=0;i<l;i++){
        if(used.indexOf(""+i)<0){
            result[key+a[i]]="";
            nextLetter(a,l,key+a[i],used+i);
        }
    }
 }
a=masterword.split("");
  l=a.length;
for (i = 0; i < a.length; i++) {
    result[a[i]] = "";
    nextLetter(a, l, a[i], "" + i)
}
return result;
}

Complete code at

Code for finding words in words

烏雲後面有陽光 2024-09-01 09:29:09

你想要某种安排。如果您熟悉排列算法,那么您就会知道您需要检查以查看何时生成了足够的数字。只需更改该限制即可:

我不知道 AS3,但这里有一个伪代码:

st = an array
Arrangements(LettersInYourWord, MinimumLettersInArrangement, k = 1)
  if ( k > MinimumLettersInArrangements )
  {
    print st;
  }

  if ( k > LettersInYourWord )
    return;      

  for ( each position i in your word that hasn't been used before )
    st[k] = YourWord[i];
    Arrangements(<same>, <same>, k + 1);

for "abc" and Arrangements(3, 2, 1);这将打印:

ab
abc
ac
acb
...

如果您想要先有三个,然后有两个,请考虑以下内容:

st = an array
Arrangements(LettersInYourWord, DesiredLettersInArrangement, k = 1)
  if ( k > DesiredLettersInArrangements )
  {
    print st;
    return
  }

  for ( each position i in your word that hasn't been used before )
    st[k] = YourWord[i];
    Arrangements(<same>, <same>, k + 1);

然后对于“abc”,调用 Arrangements(3, 3, 1); ,然后 Arrangements( 3, 2, 1);

You want a sort of arrangements. If you're familiar with the permutation algorithm then you know you have a check to see when you've generated enough numbers. Just change that limit:

I don't know AS3, but here's a pseudocode:

st = an array
Arrangements(LettersInYourWord, MinimumLettersInArrangement, k = 1)
  if ( k > MinimumLettersInArrangements )
  {
    print st;
  }

  if ( k > LettersInYourWord )
    return;      

  for ( each position i in your word that hasn't been used before )
    st[k] = YourWord[i];
    Arrangements(<same>, <same>, k + 1);

for "abc" and Arrangements(3, 2, 1); this will print:

ab
abc
ac
acb
...

If you want those with three first, and then those with two, consider this:

st = an array
Arrangements(LettersInYourWord, DesiredLettersInArrangement, k = 1)
  if ( k > DesiredLettersInArrangements )
  {
    print st;
    return
  }

  for ( each position i in your word that hasn't been used before )
    st[k] = YourWord[i];
    Arrangements(<same>, <same>, k + 1);

Then for "abc" call Arrangements(3, 3, 1); and then Arrangements(3, 2, 1);

云雾 2024-09-01 09:29:09

您可以通过查找完整字母图中的所有路径来生成字母表中的所有单词。您可以通过从每个字母进行深度优先搜索并返回每个点的当前路径来找到该图中的所有路径。

You can generate all words in an alphabet by finding all paths in a complete graph of the letters. You can find all paths in that graph by doing a depth first search from each letter and returning the current path at each point.

软的没边 2024-09-01 09:29:09

有简单的 O(N),其中 n 是词汇量的大小。
只需对词汇表中每个单词中的字母进行排序或更好,创建它们的二进制掩码,然后比较您拥有的字母。

There is simple O(N) where n is size of vocabulary.
Just sort letters in each word in vocabulary or better, create binary mask of them and then compare whit letters you have.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文