我见过很多获取给定字母集的所有排列的例子。递归似乎可以很好地获得一组字母的所有可能组合(尽管它似乎没有考虑其中两个字母是否相同)。
我想知道的是,您是否可以使用 linq(或不使用)将所有可能的字母组合减少到 3 个字母组合。
例如,给出字母:PIGGY
我想要这些字母的所有可能组合的数组,这样我就可以对照单词列表(拼字游戏?)进行检查,并最终获得您可以使用这些字母组成的所有可能单词的列表(从 3 个字母到总数,在此大小写 5 个字母)。
I have seen many examples of getting all permutations of a given set of letters. Recursion seems to work well to get all possible combinations of a set of letters (though it doesn't seem to take into account if 2 of the letters are the same).
What I would like to figure out is, could you use linq (or not) to get all possible combinations of letters down to 3 letter combinations.
For example, given the letters: P I G G Y
I want an array of all possible cominations of these letters so I can check against a word list (scrabble?) and eventually get a list of all possible words that you can make using those letters (from 3 letters up to the total, in this case 5 letters).
发布评论
评论(3)
我建议不要生成所有可能的排列(每个所需长度),而是采取稍微不同的方法,这将减少您必须完成的工作总量。
首先,找到一些单词列表(你说你要对照单词列表进行检查)。
这是单词列表的一个很好的来源:
http://www.poslarchive.com /math/scrabble/lists/index.html
接下来,对于每个单词列表(例如3个字母的单词、4个字母的单词等),构建一个字典,其键是按字母顺序排列的单词的字母,并且其值是单词。例如,给定以下单词列表:
你的字典看起来像这样(概念上)(你可能想要制作一个列表字典):
你可以将所有长度的所有单词放在同一个字典中,这实际上取决于你。
接下来,要查找给定的 N 个字母集的候选单词,请为您感兴趣的长度生成长度 K 的所有可能组合。对于拼字游戏,这将是所有组合(顺序并不重要,因此 CAT == ACT,因此不需要所有排列)2(7 选择 2)、3(7 选择 3)、4(7 选择 4)、5(7 选择 5)、6(7 选择 6)、7 个字母(7 选择 7)。可以通过首先按字母顺序对 N 个字母进行排序,然后找到长度为 K 的组合来改进这一点。
对于长度为 K 的每个组合,请检查字典以查看是否有具有此键的单词。如果是这样,他们就是比赛的候选人。
因此,对于 CAKE,对字母进行排序:
获取 2、3 和 4 个字母组合:
现在,在字典中使用这些键。您会发现 ACE 和 CAKE 是候选者。
这种方法比生成所有排列然后检查每个排列以查看它是否是单词要高效得多。使用组合方法,您不必对具有相同字母的相同长度的字母组进行单独查找。
例如,给定:
有 6 种排列(长度为 3),但只有 1 种组合(长度为 3)。因此,只需要使用密钥 AET 进行一次查找。
抱歉没有输入任何代码,但是有了这些想法,实现你想要的东西应该相对简单。
当我第一次学习 C# 和 .NET 时,我编写了一个程序,做了很多这样的事情。我将尝试发布一些片段(根据我从那时起学到的知识进行改进)。
此字符串扩展将返回一个新字符串,表示按字母顺序重新组装的输入字符串的字符:
基于此 @Adam Hughes 的回答,这是一个扩展方法,它将返回所有长度(1 到 string.Length)的所有组合(n 选择 k,而不是所有排列)输入字符串:
使用“InAlphabeticOrder”方法,您可以构建输入单词列表(拼字词典),并按其“键”索引(类似于字典,但许多单词可以具有相同的键)。
给定一个 WordEntry 列表,您可以使用 linq 查询该列表以查找可以由给定字母集组成的所有单词:
您可以找到可以由给定字母集的任意组合(任意长度)组成的所有单词像这样的字母:
[编辑]
这里有一些更多的示例代码:
从这里给出 2、3 和 4 个字母单词的列表:http://www.poslarchive.com/math/scrabble/lists/common-234.html
这是一些代码将读取这些单词(我将它们剪切并粘贴到 txt 文件中)并构造 WordEntry 对象的列表:
我已将 InAlphateticalOrder 扩展方法重命名为 ToWordKey。
这里没什么特别的,只是读取文件,将其分割成单词,并为每个单词创建一个新的 WordEntry。一次读取一行可能会更有效。当您考虑 5、6 和 7 个字母的单词时,列表也会变得相当长。这可能是一个问题,也可能不是。对于一个玩具或游戏来说,这可能没什么大不了的。如果您想变得更奇特,您可以考虑使用单词和键构建一个小型数据库。
给定一组字母,找到与键长度相同的所有可能的单词:
给定一组字母,找到从长度 2 到 length(letters) 的所有可能的单词
这段代码可能会更好,但它完成了工作。它不做的一件事是处理重复的字母。如果你有“egg”,这两个字母的组合将是“eg”、“eg”和“gg”。通过在 foreach 循环中添加对 Distinct 的调用可以很容易地解决这个问题:
这是最有效的方法吗?也许,也许不是。必须有人来做这项工作,为什么不使用 LINQ?
我认为通过这种方法,您可能不需要列表字典(
Dictionary>
)。有了这个代码和一组合适的单词,您应该能够采用任意字母组合并找到可以由它们组成的所有单词。你可以
通过查找特定长度的所有单词或任意长度的所有单词来控制单词。
这应该能让你上路。
[更多说明]
就您原来的问题而言,您将“piggy”作为输入,并且您希望找到可以由这些字母组成的所有可能的单词。使用“piggy”上的组合扩展方法,您将得到一个如下所示的列表:
等等。请注意,其中有重复。没关系,我发布的最后一段示例代码展示了如何通过应用 Distinct 运算符来查找所有唯一的组合。
因此,我们可以从给定的字母集中获取所有字母组合的列表。我的算法取决于可根据 Key 属性搜索的 WordEntry 对象列表。 Key 属性只是按字母顺序重新排列的单词字母。因此,如果您读取包含如下单词的单词文件:
WordEntry 对象的列表将如下所示:
因此,很容易构建我们要测试的单词和键列表(或有效拼字游戏单词的字典) )。
例如,(假设上面的几个单词构成了您的整个字典),如果您的拼字游戏托盘上有字母“o”“g”“d”,则可以组成单词“DOG”和“ >GOD,因为两者都有密钥
DGO
。给定一组字母,如果我们想找到由这些字母组成的所有可能的单词,我们必须能够生成所有可能的字母组合。我们可以针对“字典”(引号是因为它不是 .NET 意义上的真正字典,而是 WordEntry 对象的列表(或序列))来测试其中的每一个。为了确保键(来自我们在拼字游戏中“绘制”的字母序列)与 WordEntry 对象中的 Key 字段兼容,我们必须首先对字母进行排序。
假设我们的拼字游戏托盘上有 PIGGY。为了使用我建议的算法,我们希望从 PIGGY 获取所有可能的“Key”值。在 WordEntry 对象列表中,我们通过按字母顺序对 Word 的字母进行排序来创建 Key 字段。我们必须对托盘上的字母做同样的事情。
于是,PIGGY就变成了GGIPY。 (这就是 ToWordKey 的作用)。现在,给定托盘中按字母顺序排列的字母,我们可以使用组合来生成所有可能的组合(而不是排列)。我们可以根据键在列表中查找每个组合。如果 GGIPY 的组合与 Key 值匹配,则可以从我们的字母构造相应的 Word(WordEntry 类)。
比 PIGGY 更好的例子
首先使用 ToWordKey:
现在,进行所有长度的所有组合:
当我们查看 WordEntry 对象列表(通过读取 2、3、4 个字母单词列表组成)时,我们可能会发现找到以下组合:
这些键值对应于以下单词:
上面的最终代码示例将采用字母('s' 'e' 'a' 't'),转换为键格式(ToWordKey)生成组合(Combinations ),仅保留唯一的可能键值(Distict - 此处不是问题,因为没有重复的字母),然后查询所有 WordEntry 对象的列表,以查找其 Key 与组合之一相同的 WordEntry 对象。
本质上,我们所做的是构建一个包含 Word 和 Key 列的表(基于读取单词文件并计算每个单词的 Key),然后我们执行一个查询,将 Key 与我们生成的一系列 Key 值(从我们托盘上的字母)。
尝试逐步使用我的代码。
首先,使用 Combinations 扩展方法:
打印结果(piggy ... pi pg pg ... pig pig piy ... pigg pigy iggy ... 等)
接下来,在应用 ToWordKey 扩展方法后获取所有组合:
打印结果 (iggpy ig ig ip iy igg igp igy ... 等)
使用 Distinct() 方法消除重复项:
打印结果 (igpy ig ip iy igg igp igy ... 等)
使用其他字母集,例如“ate”和“座位”。
请注意,与使用排列算法相比,您得到的候选数要少得多。
现在,假设我们刚刚创建的组合是键值,我们将使用它们在 WordEntry 对象列表中查找,并将每个组合与 WordEntry 的键进行比较。
使用上面的
GetWords
函数以及 2、3、4 个字母单词的链接来构建 WordEntry 对象的列表。更好的是,制作一个只有几个单词的非常精简的单词列表并将其打印出来(或在调试器中查看它)。看看它是什么样子的。查看每个单词和每个键。现在,想象一下,如果您想找到所有可以用“AET”组成的单词。使用所有字母更容易想象,所以从这里开始。有 6 种排列,但只有 1 种组合!没错,您只需在单词列表中进行一次搜索即可找到所有可以用这些字母组成的 3 个字母的单词!如果有 4 个字母,则会有 24 种排列,但同样只有 1 种组合。这就是算法的本质。 ToWordKey() 函数本质上是一个哈希函数。所有具有相同数量的字母和相同的字母集的字符串将散列到相同的值。因此,构建一个单词列表及其哈希值 (Key - ToWordKey),然后,给定一组用于组成单词的字母,对字母进行哈希值 (ToWordKey) 并查找列表中具有相同哈希值的所有条目。要扩展到查找任意长度的所有单词(给定一组字母),您只需对输入进行哈希处理(通过 ToWordKey 发送整个字符串),然后查找任意长度的所有组合。由于组合是从散列字母集 AND 生成的,并且由于组合扩展方法维护每个组合中字母的原始顺序,因此每个组合都保留已散列的属性!太酷了!
希望这有帮助。
I would suggest that rather than generating all possible permutations (of each desired length), take slightly different approach that will reduce the overall amount of work that you have to do.
First, find some word lists (you say that you are going to check against a word list).
Here is a good source of word lists:
http://www.poslarchive.com/math/scrabble/lists/index.html
Next, for each word list (e.g. for 3 letter words, 4 letter words, etc), build a dictionary whose key is the letters of the word in alphabetical order, and whose value is the word. For example, given the following word list:
Your dictionary would look something like this (conceptually) (You might want to make a dictionary of List):
You could probably put all words of all lengths in the same dictionary, it is really up to you.
Next, to find candidate words for a given set of N letters, generate all possible combinations of length K for the lengths that you are interested in. For scrabble, that would all combinations (order is not important, so CAT == ACT, so all permutations is not required) of 2 (7 choose 2), 3 (7 choose 3), 4 (7 choose 4), 5 (7 choose 5), 6 (7 choose 6), 7 letters (7 choose 7). This can be improved by first ordering the N letters alphabetically and then finding the combinations of length K.
For each combination of length K, check the dictionary to see if there are any words with this key. If so, they are candidates to be played.
So, for CAKE, order the letters:
Get the 2, 3, and 4 letter combinations:
Now, use these keys into the dictionary. You will find ACE and CAKE are candidates.
This approach allows you to be much more efficient than generating all permutations and then checking each to see if it is a word. Using the combination approach, you do not have to do separate lookups for groups of letters of the same length with the same letters.
For example, given:
There are 6 permutations (of length 3), but only 1 combination (of length 3). So, only one lookup is required, using the key AET.
Sorry for not putting in any code, but with these ideas, it should be relatively straightforward to achieve what you want.
I wrote a program that does a lot of this back when I was first learning C# and .NET. I will try to post some snippets (improved based on what I have learned since then).
This string extension will return a new string that represents the input string's characters reassembled in alphabetical order:
Based on this answer by @Adam Hughes, here is an extension method that will return all combinations (n choose k, not all permutations) for all lengths (1 to string.Length) of the input string:
Using the "InAlphabeticOrder" method, you can build a list of your input words (scrabble dictionary), indexed by their "key" (similar to dictionary, but many words could have the same key).
Given a list of WordEntry, you can query the list using linq to find all words that can be made from a given set of letters:
You could find all words that could be made from any combination (of any length) of a given set of letters like this:
[EDIT]
Here is some more sample code:
Given a list of 2, 3, and 4 letter words from here: http://www.poslarchive.com/math/scrabble/lists/common-234.html
Here is some code that will read those words (I cut and pasted them into a txt file) and construct a list of WordEntry objects:
I have renamed the InAlphateticalOrder extension method to ToWordKey.
Nothing fancy here, just read the file, split it into words, and create a new WordEntry for each word. Possibly could be more efficient here by reading one line at a time. The list will also get pretty long when you consider 5, 6, and 7 letter words. That might be an issue and it might not. For a toy or a game, it is probably no big deal. If you wanted to get fancy, you might consider building a small database with the words and keys.
Given a set of letters, find all possible words of the same length as the key:
Given a set of letters, find all possible words from length 2 to length(letters)
This code could probably be better, but it gets the job done. One thing that it does not do is handle duplicate letters. If you have "egg", the two letter combinations will be "eg", "eg", and "gg". That can be fixed easily enough by adding a call to Distinct to the foreach loop:
Is that the most efficient way to do it? Maybe, maybe not. Somebody has to do the work, why not LINQ?
I think that with this approach you probably don't need a Dictionary of Lists (
Dictionary<string,List<string>>
).With this code and with a suitable set of words, you should be able to take any combination of letters and find all words that can be made from them. You can
control the words by finding all words of a particular length, or all words of any length.
This should get you on your way.
[More Clarification]
In terms of your original question, you take as input "piggy" and you want to find all possible words that can be made from these letters. Using the Combinations extension method on "piggy", you will come up with a list like this:
etc. Note that there are repetitions. That is ok, the last bit of sample code that I posted showed how to find all unique Combinations by applying the Distinct operator.
So, we can get a list of all combinations of letters from a given set of letters. My algorithm depends on the list of WordEntry objects being searchable based on the Key property. The Key property is simply the letters of the word rearranged into alphabetical order. So, if you read a word file containing words like this:
The list of WordEntry objects will look like this:
So, it's easy enough to build up the list of words and keys that we want to test against (or dictionary of valid scrabble words).
For example, (assume the few words above form your entire dictionary), if you had the letters 'o' 'g' 'd' on your scrabble tray, you could form the words
DOG
andGOD
, because both have the keyDGO
.Given a set of letters, if we want to find all possible words that can be made from those letters, we must be able to generate all possible combinations of letters. We can test each of these against the "dictionary" (quotes because it is not REALLY a Dictionary in the .NET sense, it is a list (or sequence) of WordEntry objects). To make sure that the keys (from the sequence of letters that we have "drawn" in scrabble) is compatible with the Key field in the WordEntry object, we must first order the letters.
Say we have PIGGY on our scrabble tray. To use the algorithm that I suggested, we want to get all possible "Key" values from PIGGY. In our list of WordEntry objects, we created the Key field by ordering the Word's letters in alphabetic order. We must do the same with the letters on our tray.
So, PIGGY becomes GGIPY. (That is what ToWordKey does). Now, given the letters from our tray in alphabetical order, we can use Combinations to generate all possible combinations (NOT permumations). Each combination we can look up in our list, based on Key. If a combination from GGIPY matches a Key value, then the corresponding Word (of the WordEntry class) can be constructed from our letters.
A better example than PIGGY
First use ToWordKey:
Now, make all Combinations of all lengths:
When we look in our list of WordEntry objects (made from reading in the list of 2, 3, 4 letter words), we will probably find that the following combinations are found:
These Key values correspond to the following words:
The final code example above will take the letters ('s' 'e' 'a' 't'), convert to Key format (ToWordKey) generate the combinations (Combinations), keep only the unique possible key values (Distict - not an issue here since no repeated letters), and then query the list of all WordEntry objects for those WordEntry objects whose Key is the same as one of the combinations.
Essentially, what we have done is constructed a table with columns Word and Key (based on reading the file of words and computing the Key for each) and then we do a query joining Key with a sequence of Key values that we generated (from the letters on our tray).
Try using my code in steps.
First, use the Combinations extension method:
Print the result (p i g g y ... pi pg pg ... pig pig piy ... pigg pigy iggy ... etc)
Next, get all combinations after applying the ToWordKey extension method:
Print the result (i g g p y ig ig ip iy igg igp igy ... etc)
Eliminate duplicates with the Distinct() method:
Print the result (i g p y ig ip iy igg igp igy ... etc)
Use other sets of letters like "ate" and "seat".
Notice that you get significantly fewer candidates than if you use a permutation algorithm.
Now, imagine that the combinations that we just made are the key values that we will use to look in our list of WordEntry objects, comparing each combination to the Key of a WordEntry.
Use the
GetWords
function above and the link to the 2, 3, 4 letter words to build the list of WordEntry objects. Better yet, make a very stripped down word list with only a few words and print it out (or look at it in the debugger). See what it looks like. Look at each Word and each Key. Now, imagine if you wanted to find ALL words that you could make with "AET". It is easier to imagine using all letters, so start there. There are 6 permutations, but only 1 combination! That's right, you only have to make one search of the word list to find all 3 letter words that can be made with those letters! If you had 4 letters there would be 24 permutations, but again, only 1 combination.That is the essence of the algorithm. The ToWordKey() function is essentially a hash function. All strings with the same number of letters and the same set of letters will hash to the same value. So, build a list of Words and their hashes (Key - ToWordKey) and then, given a set of letters to use to make words, hash the letters (ToWordKey) and find all entries in the list with the same hash value. To extend to finding all words of any length (given a set of letters), you just have to hash the input (send the whole string through ToWordKey), then find all Combinations of ANY length. Since the combinations are being generated from the hashed set of letters AND since the Combinations extension method maintains the original ordering of the letters in each combination, then each combination retains the property of having been hashed! That's pretty cool!
Hope this helps.
这个方法似乎有效。它同时使用 Linq 和过程代码。
This method seems to work. It's using both Linq and procedural code.
搜索一个单词的所有排列的问题是计算绝对乱码所花费的工作量。生成所有排列的时间复杂度为 O(n!),其中大部分绝对会被浪费。这就是为什么我推荐wageoghe的答案。
这是一个返回所有排列的递归 linq 函数:
您可以像这样找到您想要的单词:
但是:
警告:我不喜欢这个非常< /strong> 低效的答案
现在 linq 最大的好处(对我来说)是它的可读性。然而,话虽如此,我认为上述代码的意图并不明确(而且是我写的!)。因此,linq(对我而言)的最大优势并未在上面体现,并且它不如非 linq 解决方案高效。我通常会原谅 linq 缺乏执行效率,因为它提高了编码时间、可读性和易于维护的效率,但我只是不认为 linq 解决方案最适合这里......方钉、圆孔排序如果你愿意的话。
另外,还有我上面提到的复杂性问题。当然,它可以在 0.2 秒内找到“piggy”的 153 个三个字母或更多排列,但给它一个像“bookkeeper”这样的单词,你将等待一个坚实的1 分 39 秒找到所有 435,574 个三个字母或更多字母的排列。那么我为什么要发布这么糟糕的功能呢?为了表明 Waoghe 的做法是正确的。生成所有排列并不是解决此问题的足够有效的方法。
The problem with searching for all permutations of a word is the amount of work that will be spent on calculating absolute gibberish. generating all permutations is O(n!) and sooo much of that will be absolutely wasted. This is why I recommend wageoghe's answer.
Here is a recursive linq function that returns all permutations:
You can find the words you want like so:
However:
WARNING: I am not fond of this very inefficient answer
Now linq's biggest benefit (to me) is how readable it is. Having said that, however, I do not think the intent of the above code is clear (and I wrote it!). Thus the biggest advantage of linq (to me) is not present above, and it is not as efficient as a non-linq solution. I usually forgive linq's lack of execution efficiency because of the efficiency it adds for coding time, readability, and ease of maintenance, but I just don't think a linq solution is the best fit here...a square peg, round hole sort of thing if you will.
Also, there's the matter of the complexity I mentioned above. Sure it can find the 153 three letters or more permutations of 'piggy' in .2 seconds, but give it a word like 'bookkeeper' and you will be waiting a solid 1 minute 39 seconds for it to find all 435,574 three letters or more permutations. So why did I post such a terrible function? To make the point that wageoghe has the right approach. Generating all permutations just isn't an efficient enough approach to this problem.