C# (.Net 2.0) 微优化第 2 部分:查找网格内的连续组
我有一个非常简单的函数,它接受一个匹配的位字段、一个网格和一个正方形。 它曾经使用委托,但我做了很多重新编码,并最终使用位字段 &
操作来避免委托,同时仍然能够在合理范围内执行匹配。 基本上,挑战是从特定的“前导”方块开始,找到网格内与 match
位字段匹配的所有连续元素。 Square 是一个有点小(但不是很小)的类。 关于如何加快速度有什么建议吗? 请注意,网格本身非常小(本测试中有 500 个元素)。
编辑:值得注意的是,该函数每秒被调用超过 200,000 次。 事实上,从长远来看,我的目标是减少调用它的频率,但这确实很困难,因为我的最终目标是让分组系统通过脚本来处理而不是硬编码。 也就是说,这个函数总是比任何其他函数被调用得更多。
编辑:澄清一下,根据设计,该函数不会检查leader
是否与位字段匹配。 目的是领导者不需要匹配位字段(尽管在某些情况下会匹配)。
尝试失败:
- 初始化字典并使用容量进行堆栈。
- 将 int 强制转换为枚举以避免强制转换。
- 将字典和堆栈移到函数之外,并在每次需要时清除它们。 这会让事情变得更慢!
事情尝试成功:
- 编写哈希码函数而不是使用默认值:哈希码是预先计算的并且等于
x + y *parent.Width
。 谢谢你的提醒,吉姆·米歇尔。 - mquander 的技术:请参阅下面的
GetGroupMquander
。 进一步优化:切换到 HashSets 后,我删除了
Contains
测试,并用Add
测试替换了它。Contains
和Add
都被迫寻找键,因此仅检查添加是否成功比在Contains
失败时添加检查失败更有效。 即if (RetVal.Add(s)) curStack.Push(s);
公共静态列表
GetGroup(int match, 模型网格, 方形领导者) { 堆栈<正方形> curStack = new Stack (); 字典 Retval = new Dictionary (); curStack.Push(领导者); while (curStack.Count != 0) { 方形 curItem = curStack.Pop(); if (Retval.ContainsKey(curItem)) 继续; Retval.Add(curItem, true); foreach(curItem.Neighbors 中的 Square) { if (0 != ((int)(s.RoomType) & match)) { curStack.Push(s); } } } 返回新列表 (Retval.Keys); }
=====
public static List<Square> GetGroupMquander(int match, Model grid, Square leader)
{
Stack<Square> curStack = new Stack<Square>();
Dictionary<Square, bool> Retval = new Dictionary<Square, bool>();
Retval.Add(leader, true);
curStack.Push(leader);
while (curStack.Count != 0)
{
Square curItem = curStack.Pop();
foreach (Square s in curItem.Neighbors)
{
if (0 != ((int)(s.RoomType) & match))
{
if (!Retval.ContainsKey(s))
{
curStack.Push(s);
Retval.Add(curItem, true);
}
}
}
}
return new List<Square>(Retval.Keys);
}
I have a very simple function which takes in a matching bitfield, a grid, and a square. It used to use a delegate but I did a lot of recoding and ended up with a bitfield &
operation to avoid the delegate while still being able to perform matching within reason. Basically, the challenge is to find all contiguous elements within a grid which match the match
bitfield, starting from a specific "leader" square.
Square is somewhat small (but not tiny) class. Any tips on how to push this to be even faster? Note that the grid itself is pretty small (500 elements in this test).
Edit: It's worth noting that this function is called over 200,000 times per second. In truth, in the long run my goal will be to call it less often, but that's really tough, considering that my end goal is to make the grouping system be handled with scripts rather than being hardcoded. That said, this function is always going to be called more than any other function.
Edit: To clarify, the function does not check if leader
matches the bitfield, by design. The intention is that the leader is not required to match the bitfield (though in some cases it will).
Things tried unsuccessfully:
- Initializing the dictionary and stack with a capacity.
- Casting the int to an enum to avoid a cast.
- Moving the dictionary and stack outside the function and clearing them each time they are needed. This makes things slower!
Things tried successfully:
- Writing a hashcode function instead of using the default: Hashcodes are precomputed and are equal to
x + y * parent.Width
. Thanks for the reminder, Jim Mischel. - mquander's Technique: See
GetGroupMquander
below. Further Optimization: Once I switched to HashSets, I got rid of the
Contains
test and replaced it with anAdd
test. BothContains
andAdd
are forced to seek a key, so just checking if an add succeeds is more efficient than adding if aContains
fails check fails. That is,if (RetVal.Add(s)) curStack.Push(s);
public static List<Square> GetGroup(int match, Model grid, Square leader) { Stack<Square> curStack = new Stack<Square>(); Dictionary<Square, bool> Retval = new Dictionary<Square, bool>(); curStack.Push(leader); while (curStack.Count != 0) { Square curItem = curStack.Pop(); if (Retval.ContainsKey(curItem)) continue; Retval.Add(curItem, true); foreach (Square s in curItem.Neighbors) { if (0 != ((int)(s.RoomType) & match)) { curStack.Push(s); } } } return new List<Square>(Retval.Keys); }
=====
public static List<Square> GetGroupMquander(int match, Model grid, Square leader)
{
Stack<Square> curStack = new Stack<Square>();
Dictionary<Square, bool> Retval = new Dictionary<Square, bool>();
Retval.Add(leader, true);
curStack.Push(leader);
while (curStack.Count != 0)
{
Square curItem = curStack.Pop();
foreach (Square s in curItem.Neighbors)
{
if (0 != ((int)(s.RoomType) & match))
{
if (!Retval.ContainsKey(s))
{
curStack.Push(s);
Retval.Add(curItem, true);
}
}
}
}
return new List<Square>(Retval.Keys);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您发布的代码假定
leader
方块与位字段匹配。 这是设计使然吗?我假设您的 Square 类已经实现了 GetHashCode 方法,该方法快速且提供了良好的分布。
你确实说过微优化。 。 。
如果您很清楚需要多少个项目,那么通过预先分配字典可以节省一些时间。 也就是说,如果您知道匹配的项目不会超过 100 个,您可以这样写:
这将避免增加字典并重新散列所有内容。 您还可以对堆栈执行相同的操作:将其预先分配到某个合理的最大大小,以避免以后调整大小。
既然你说网格非常小,那么将堆栈和字典分配给网格大小似乎是合理的,如果这很容易确定的话。 您只是谈论每个
grid_size
引用,因此除非您的网格变得非常大,否则内存不是问题。在执行推送之前添加检查以查看某个项目是否在字典中可能会加快速度。 它取决于字典查找的相对速度,而不是堆栈中重复项的开销。 可能值得尝试一下,尽管如果它能产生很大的不同,我会感到惊讶。
我真的在为这最后一项做准备。 你的内循环中有这样的演员。 我知道 C# 编译器有时会为看似简单的转换生成数量惊人的代码,并且我不知道 JIT 编译器是否对此进行了优化。 您可以通过创建枚举类型的局部变量并向其分配
match
的值来从内部循环中删除该强制转换:然后您的内部循环比较将变为:
无强制转换,这可能会减少一些周期。
编辑:除了微观优化之外,通过稍微修改算法以避免多次处理任何项目,您可能会获得更好的性能。 就目前情况而言,匹配的项目可能会多次出现在堆栈中,而不匹配的项目可能会被多次处理。 由于您已经在使用字典来跟踪匹配的项目,因此您可以通过为不匹配的项目指定值
false
来跟踪它们。 最后,您只需创建一个包含具有true
值的项目的List
即可。The code you posted assumes that the
leader
square matches the bitfield. Is that by design?I assume your
Square
class has implemented aGetHashCode
method that's quick and provides a good distribution.You did say micro-optimization . . .
If you have a good idea how many items you're expecting, you'll save a little bit of time by pre-allocating the dictionary. That is, if you know you won't have more than 100 items that match, you can write:
That will avoid having to grow the dictionary and re-hash everything. You can also do the same thing with your stack: pre-allocate it to some reasonable maximum size to avoid resizing later.
Since you say that the grid is pretty small it seems reasonable to just allocate the stack and the dictionary to the grid size, if that's easy to determine. You're only talking
grid_size
references each, so memory isn't a concern unless your grid becomes very large.Adding a check to see if an item is in the dictionary before you do the push might speed it up a little. It depends on the relative speed of a dictionary lookup as opposed to the overhead of having a duplicate item in the stack. Might be worth it to give this a try, although I'd be surprised if it made a big difference.
I'm really stretching on this last one. You have that cast in your inner loop. I know that the C# compiler sometimes generates a surprising amount of code for a seemingly simple cast, and I don't know if that gets optimized away by the JIT compiler. You could remove that cast from your inner loop by creating a local variable of the enum type and assigning it the value of
match
:Then your inner loop comparison becomes:
No cast, which might shave some cycles.
Edit: Micro-optimization aside, you'll probably get better performance by modifying your algorithm slightly to avoid processing any item more than once. As it stands, items that do match can end up in the stack multiple times, and items that don't match can be processed multiple times. Since you're already using a dictionary to keep track of items that do match, you can keep track of the non-matching items by giving them a value of
false
. Then at the end you simply create aList
of those items that have atrue
value.这里有一些建议 -
如果您使用 .NET 3.5,您可以将 RetVal 更改为 HashSet而不是 Dictionary,因为您从不使用字典中的值(仅使用键)。 这将是一个小小的改进。
另外,如果将返回更改为 IEnumerable,则可以直接返回 HashSet 的枚举器。 根据结果的使用情况,在某些区域可能会更快(如果您确实需要列表,您始终可以在结果上使用 ToList() )。
然而,这里可以添加一个很大的优化 -
现在,您总是添加每个邻居,即使该邻居已经被处理过。 例如,当处理leader时,它会添加leader+1y,然后当处理leader+1y时,它会将BACK放入leader中(即使你已经处理了那个Square),下次领导者从堆栈中弹出,你继续。 这是很多额外的处理。
尝试添加:
这样,如果您已经处理了邻居的平方,它不会被重新添加到堆栈中,只是在稍后弹出时被跳过。
Here are a couple of suggestions -
If you're using .NET 3.5, you could change RetVal to a HashSet<Square> instead of a Dictionary<Square,bool>, since you're never using the values (only the keys) in the Dictionary. This would be a small improvement.
Also, if you changed the return to IEnumerable, you could just return the HashSet's enumerator directly. Depending on the usage of the results, it could potentially be faster in certain areas (and you can always use ToList() on the results if you really need a list).
However, there is a BIG optimization that could be added here -
Right now, you're always adding in every neighbor, even if that neighbor has already been processed. For example, when leader is processed, it adds in leader+1y, then when leader+1y is processed, it puts BACK in leader (even though you've already handled that Square), and next time leader is popped off the stack, you continue. This is a lot of extra processing.
Try adding:
This way, if you've already processed the square of your neighbor, it doesn't get re-added to the stack, just to be skipped when it's popped later.