在 Parallel.ForEach 中使用哈希表?
我有一个 Parallel.ForEach 循环在体内运行密集操作。
该操作可以使用哈希表来存储值,并且可以重用于其他连续的循环项。我在密集操作完成后添加到Hashtable中,下一个循环项可以在Hashtable中查找并重用该对象,而不用再次运行密集操作。
但是,因为我使用 Parallel.ForEach 存在不安全问题,导致 Hashtable.Add 和 ContainsKey(key) 调用不同步,因为它们可能并行运行。引入锁可能会导致性能问题。
下面是示例代码:
Hashtable myTable = new Hashtable;
Parallel.ForEach(items, (item, loopState) =>
{
// If exists in myTable use it, else add to hashtable
if(myTable.ContainsKey(item.Key))
{
myObj = myTable[item.Key];
}
else
{
myObj = SomeIntensiveOperation();
myTable.Add(item.Key, myObj); // Issue is here : breaks with exc during runtime
}
// Do something with myObj
// some code here
}
TPL 库中必须有一些 API、属性设置可以处理这种情况。有没有?
I have a Parallel.ForEach loop running an intensive operation inside the body.
The operation can use a Hashtable to store the values, and can be reused for other consecutive loop items. I add to the Hashtable after the intensive operation is complete, the next loop item can look up in the Hashtable and reuse the object, instead of running the intensive operation again.
However, because I am using Parallel.ForEach there is a unsafe issue, causing the Hashtable.Add and the ContainsKey(key) calls go out of sync, as they might be running in parallel. Introducing locks may cause perf issues.
Here's the sample code:
Hashtable myTable = new Hashtable;
Parallel.ForEach(items, (item, loopState) =>
{
// If exists in myTable use it, else add to hashtable
if(myTable.ContainsKey(item.Key))
{
myObj = myTable[item.Key];
}
else
{
myObj = SomeIntensiveOperation();
myTable.Add(item.Key, myObj); // Issue is here : breaks with exc during runtime
}
// Do something with myObj
// some code here
}
There must be some API, Property setting inside TPL library, that could handle this scenario. Is there?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您正在寻找
System.Collections.Concurrent。 ConcurrentDictionary
。新的并发集合使用显着改进的锁定机制,并且应该在并行算法中表现出色。编辑:结果可能如下所示:
警告:如果
items
中的元素并非全部具有唯一的item.Key
,则< code>SomeIntentialOperation 可能会对该键调用两次。在示例中,密钥未传递给SomeIntectiveOperation
,但这意味着“Do Something with value”代码可以执行 key/valueA 和 key/valueB 对,并且只会存储一个结果在缓存中(也不一定是由 SomeIntectiveOperation 计算的第一个)。如果这是一个问题,您需要一个并行的惰性工厂来处理这个问题。另外,出于显而易见的原因,SomeIntectiveOperation 应该是线程安全的。You're looking for
System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>
. The new concurrent collections use significantly improved locking mechanisms and should perform excellectly in parallel algorithms.Edit: The result might look like this:
Word of warning: if the elements in
items
do not all have uniqueitem.Key
, thenSomeIntensiveOperation
could get called twice for that key. In the example, the key isn't passed toSomeIntensiveOperation
, but it means that the "Do something with value" code could execute key/valueA and key/valueB pairs, and only one result would get stored in the cache (not necessarily the first one computed by SomeIntensiveOperation either). You'd need a parallel lazy factory to handle this if it's a problem. Also, for obvious reasons SomeIntensiveOperation should be thread safe.检查 System.Collections.Concurrent 命名空间我认为您需要 ConcurrentDictionary
check the System.Collections.Concurrent namespace i think you need ConcurrentDictionary
使用 ReaderWriterLock,这对于持续时间短、读取次数多、写入次数少的工作具有良好的性能。您的问题似乎符合此规范。
所有读取操作都将快速运行并且无锁,唯一会被阻止的时间是发生写入时,并且该写入仅与将某些内容推入哈希表所需的时间相同。
MSDN 上的 ReaderWriterLockSlim
我想我会扔掉一些代码...
Use a ReaderWriterLock, this has good performance for work that has many reads and few writes that are of a short duration. Your problem seems to fit this specification.
All read operations will run quickly and lock free, the only time anyone will be blocked is when a write is happening, and that write is only as long as it takes to shove something in a Hashtable.
ReaderWriterLockSlim on MSDN
I guess I'll throw down some code...
我认为除了使用(或多或少明确的)锁(同步哈希表只是覆盖所有带锁的方法)之外,没有其他正确的选择。
另一种选择是允许字典不同步。竞争条件不会破坏字典,它只需要代码做一些多余的计算。分析代码以检查锁定或丢失记忆是否会产生更糟糕的影响。
I see no other correct choice than to use (more or less explicit) locks (A synchronized Hashtable just overrides all methods with locks).
Another option could be to allow the dictionary to go out of sync. The race condition will not corrupt the dictionary, it will just require the code to do some superfluous computations. Profile the code to check whether the lock or missing memoization has worse effects.