我该如何优化这个递归方法
我正在尝试制作一个单词益智游戏,为此我使用递归方法来查找给定字母中所有可能的单词。 这些字母位于 4x4 板上。
像这样:
ABCD
EFGH
HIJK
LMNO
在这个循环内调用递归方法:
for (int y = 0; y < width; y++)
{
for (int x = 0; x < height; x++)
{
myScabble.Search(letters, y, x, width, height, "", covered, t);
}
}
letters 是一个 2D 字符数组。
y & x 是整数,显示板
宽度和宽度的位置。 height 也是 int,它告诉板的尺寸
“”是我们试图使(单词)
覆盖的字符串是一个布尔数组,以检查我们是否已经使用了该正方形。
t 是一个列表(其中包含要检查的所有单词)。
需要优化的递归方法:
public void Search(char[,] letters, int y, int x, int width, int height, string build, bool[,] covered, List<aWord> tt)
{
// Dont get outside the bounds
if (y >= width || y < 0 || x >= height || x < 0)
{
return;
}
// Dont deal with allrady covered squares
if (covered[x, y])
{
return;
}
// Get Letter
char letter = letters[x, y];
// Append
string pass = build + letter;
// check if its a possible word
//List<aWord> t = myWords.aWord.Where(w => w.word.StartsWith(pass)).ToList();
List<aWord> t = tt.Where(w => w.word.StartsWith(pass)).ToList();
// check if the list is emphty
if (t.Count < 10 && t.Count != 0)
{
//stop point
}
if (t.Count == 0)
{
return;
}
// Check if its a complete word.
if (t[0].word == pass)
{
//check if its allrdy present in the _found dictinary
if (!_found.ContainsKey(pass))
{
//if not add the word to the dictionary
_found.Add(pass, true);
}
}
// Check to see if there is more than 1 more that matches string pass
// ie. are there more words to find.
if (t.Count > 1)
{
// make a copy of the covered array
bool[,] cov = new bool[height, width];
for (int i = 0; i < width; i++)
{
for (int a = 0; a < height; a++)
{
cov[a, i] = covered[a, i];
}
}
// Set the current square as covered.
cov[x, y] = true;
// Continue in all 8 directions.
Search(letters, y + 1, x, width, height, pass, cov, t);
Search(letters, y, x + 1, width, height, pass, cov, t);
Search(letters, y + 1, x + 1, width, height, pass, cov, t);
Search(letters, y - 1, x, width, height, pass, cov, t);
Search(letters, y, x - 1, width, height, pass, cov, t);
Search(letters, y - 1, x - 1, width, height, pass, cov, t);
Search(letters, y - 1, x + 1, width, height, pass, cov, t);
Search(letters, y + 1, x - 1, width, height, pass, cov, t);
}
}
代码按照我的预期工作,但是速度很慢..大约需要 2 分钟才能找到单词。
编辑:我澄清了这些字母 数组是二维的
我想展示我如何让算法运行得非常快。 我使用 @Enigmativity 的 Trie 实现,以及 @EricLippert 描述的搜索模式
public void SearchWord(char[,] letters, Trie parentTrie, char[] build, int x, int y, bool[,] covered )
{
char[] pass = new char[build.Length + 1];
build.CopyTo(pass, 0);
// iterate through all squares in the board.
for (var r = 0; r < letters.GetLength(0); r++ )
{
for (var c = 0; c < letters.GetLength(1); c++)
{
//check if this square is naighbor to the last square
if ((IsNeighbor(x, y, r, c)|| x == -1) && !(covered[r, c]))
{
// check if the current Trie contains the letter
if (parentTrie.ContainsKey(letters[r,c]))
{
pass[build.Length] = letters[r, c];
covered[r, c] = true;
SearchWord(letters, parentTrie[letters[r, c]], pass, r, c, covered);
covered[r, c] = false;
}
if (parentTrie.ContainsKey('$') && (!myStrings.Contains(new string(build).ToLower())))
myStrings.Add(new string(build).ToLower());
}
}
}
}
它最初是这样调用的:
SearchWord(letters, trie, new char[0], -1, -1, new bool[letters.GetLength(0), letters.GetLength(1)]);
我意识到我可以添加字母作为属性,但由于它是一个引用类型,所以时间并不昂贵
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
其他答案是正确的:您应该完全放弃这个算法并重新开始。
在文字游戏中处理这些问题的方法是将字典转换成适合您想要执行的搜索类型的形式。在你的例子中,你想要构建的数据结构称为trie,这是一个双关语,因为它是一个快速“re-trie-val”的“树”,哈哈哈,那些计算机科学的家伙非常机智!
trie 的工作方式是有一棵树,其中每个节点最多有 27 个子节点。假设您有字典 { AB, ACE, ACT, CAT }。 trie 看起来像这样:
其中 $ 表示“单词结束”。从根到 $ 的每条路径都是一个合法的单词。
现在要找到所有单词,首先从字典中构建特里树。然后,对于网格中的每个起点,尝试特里树中的每个可能的单词。如果您从包含 A 的网格方块开始,那么您当然不会向下遍历 trie 的 C 分支。然后,对于特里树中 A 的每个子节点,查看网格是否有一个相邻的正方形可以让您更深入地进入特里树。
这是有趣的递归算法,您会发现它非常快,因为您可以非常聪明地放弃从给定方块开始不可能存在的大量单词。
这一切有意义吗?
The other answers are right: you should abandon this algorithm entirely and start over.
The way to deal with these problems in word games is to transform the dictionary into a form that is amenable to the kinds of searches you want to do. In your case, the data structure you want to build is called a trie, which is a pun because it is a "tree" that does fast "re-trie-val", ha ha ha, those computer science guys are very witty!
The way a trie works is you have a tree where every node has up to 27 children. Suppose you have the dictionary { AB, ACE, ACT, CAT }. The trie looks like this:
where $ means "the word is ended". Every path from the root to a $ is a legal word.
Now to find all the words, first build the trie out of the dictionary. Then for each starting point in the grid, try every possible word in the trie. If you start with a grid square that contains an A then of course you do not traverse down the C branch of the trie. Then for each child of the A in the trie, see if the grid has a neighbouring square that can drive you deeper into the trie.
That's the interesting recursive algorithm, and you'll find that it is very fast because you can be very smart about abandoning huge numbers of words that can't possibly exist starting from a given square.
Does that all make sense?
设置不变参数(字母、宽度、高度和 tt)字段。不要使用 Linq,它在多次迭代中会很慢。另外,将单词列表更改为更有效的词典或某种类型的排序列表。
Make the invariant arguments (letter, width, height and tt) fields. Do not use Linq, it is slow over many iterations. Also, change your list of words to a more effecient Dictionary or some type of sorted list.
从相反的方向攻击可能更有效。迭代可能的单词列表,在网格中查找该单词的第一个字母,如果找到它,则在相邻的方块中查找第二个字母,如果发现它,则继续尝试沿着该行进一步匹配。
通过这种方式,您可以快速完全消除单词,而不是重复检查每个可能位置的每个单词。
仅查看您在将重复数千次 LINQ 的代码中使用的代码可能是一个禁忌。您正在使用字符串操作,因此您在内存中生成数千个字符串,并且可能导致垃圾收集器在递归期间运行,您应该切换到字符串生成器或某种字符数组。
此时您可以进行大量优化,例如,对列表进行快速的第一次检查,检查每个单词的每个字母是否位于网格中的某个位置(只需将网格中的所有字母放入一个字符串中即可轻松完成此操作) ,那么您可以快速消除并非所有字母都在网格中的单词 - 甚至无需担心它们的顺序是否正确。
您可以将网格转换为一系列字符串,每个字符串代表网格的某个方向,然后仅使用纯字符串搜索。如果您找到匹配项,您只需要确保匹配项没有跨越网格边界,这可以通过快速检查匹配项的起点和终点来完成。
It is probably more efficient to attack this in the opposite direction. Iterate over you list of possible words, look for the first letter of the word in the grid, if you find it look for the second in adjacent square, if you find it keep trying to match further along that line.
This way you can quickly eliminate words entirely rather than repeatedly check for every word in every possible position.
Just looking at the code you are using in something that is going to repeat thousands of times LINQ is probably a no-no. You are using string manipulation, so you are generating thousands of strings in memory, and possibly causing the garbage collector to run during the recursion, you should switch to string builder or some sort of character array.
There are loads of optimisations you can do at this point, for example do a quick first pass on your list checking that every letter of each word is somewhere in the grid (just put all letters in the grid into a string to make this easy), then you can quickly eliminate words where not all their letters are in the grid - before even worrying about if they are in the right order.
You could turn your grid into a series of strings, each representing a certain orientation of the grid and then just use plain string searches. If you find a match you would just need to make sure the match wasn't across the boundary of the grid which can be done by checking the start and end point of the match quite quickly.
我想我可以根据埃里克·利珀特的回答提出代码。埃里克(Eric)做到了,但硬代码总是更好。 ;-)
首先,这是一个 trie 的简单实现:
此代码允许您将字符串传递到
Add
&包含
方法,字符串为IEnumerable
。它可以像这样使用:
现在,给定一个字母网格和一个加载了可能单词的特里树,我可以运行以下查询来获取匹配项:
GetPossibleWords
方法的实现如下:这个解决方案的性能看起来相当不错。
OP 有一个 4x4 网格,大约需要 120 秒(2 分钟)。我的代码可以在 20.3 秒内匹配 40x40 网格中 208,560 个单词的列表。
不错吧?
再次感谢 Eric 提出使用 trie 的想法。
I thought I might put forward code based on Eric Lippert's answer. Eric nailed it, but hard code is always better. ;-)
To start with, here is a simple implementation of a trie:
This code allows you to pass strings to the
Add
&Contains
methods as strings areIEnumerable<char>
.It can be used like so:
Now, given a grid of letters and a trie loaded up with the possible words I can run the following query to get the matches:
The
GetPossibleWords
method is implemented as such:The performance of this solution seems quite good.
The OP had a 4x4 grid taking roughly 120 seconds (2 mins). My code can match a list of 208,560 words in a 40x40 grid in 20.3 seconds.
Not bad, huh?
Thanks again to Eric for the idea of using a trie.
List
来存储所有单词,而是使用一些树状结构。或者至少是排序列表+二分搜索。这一点应该会显着提高性能。我建议使用有序集合并通过递归方法传递范围索引。即,您将查找第一个字母并确定以该字母开头的单词有 123-456 个索引。下次您附加新字母时,您将使该范围变得更加狭窄,并且您无需遍历整个集合!ToList
方法)。您可以利用它的惰性性质仅进行迭代,直到您需要为止,因为如果该数字大于 2,您并不关心有多少单词以此子词开头。List
to store all the words use some tree-like structure. Or at least a sorted list + binary search. This point should improve performance dramatically. I would recommend using ordered collection and passing range indices through recursice methods. I.e. you will look for first letter and determine that words starting with that letter have 123-456 indices. Next time you append new letter you will make this range even more narrow and you won't need to go through entire collection!ToList
method). You can use it's lazy nature to iterate only until you need since you do not care how many words start with this subword if this number is more than 2.cov[x, y] = true;
and after nested search tries put backfalse
value:cov[x, y] = false;
一种优化是定向搜索。您只需要对角线向下、向左和左下搜索,而对角向上、向右和右上搜索将得到与简单反转字符串相同的结果。
编辑:
感兴趣的可能还有 Project Euler Problem #11 以及使用 C# 和 LINQ 的解决方案的撰写 此处。
这是我试图描述的一个完全未经测试的例子。
One optimization would be in your directional search. You only need to search down, left, and down-left diagonally as searching up, right, and up-right diagonally will give you the same result as simply reversing the string.
EDIT:
Of interest also might be Project Euler Problem #11 and a writeup of a solution using C# and LINQ here.
This is a completely untested example of what I was trying to describe.
另一个优化是数据结构:
对于所有有效单词来说,这是 O(n),您可以通过使用 ie trie 来提高性能。
Another optimization would be data structures:
This is O(n) over all valid words, you could improve the performance by using i.e. a trie.
我不太了解 C#,但这是我在 C++ 中的做法(有些内容是伪代码,以便于阅读);它应该很容易翻译:
在 C# 中,
set
将是SortedSet
,而vector 将是
类型将是二维数组。不过,我不确定与SortedSet
。 >lower_bound
的等价物;SortedSet
似乎没有类似的东西。I don't know C# too well, but here's how I would do it in C++ (some things are pseudocode to make it easier to read); it should be easy to translate:
In C#,
set<string>
would beSortedSet
, and thevector<vector<...> >
types would be 2-D arrays. I am not sure of the equivalent tolower_bound
, though;SortedSet
doesn't seem to have anything like it.