您使用并行扩展吗?
我希望这不是对 stackoverflow 的滥用;最近我在这里看到了一些关于并行扩展的好问题,这引起了我的兴趣。
我的问题: 您是否使用并行扩展?如果使用,如何使用?
我叫 Stephen Toub,是 Microsoft 并行计算平台团队的成员。我们是负责并行扩展的小组。我总是有兴趣了解开发人员如何利用并行扩展(例如 Parallel.For、PLINQ、ConcurrentDictionary 等)、您的积极经验、您的消极经验、对未来的功能请求等等上。
如果您愿意分享此类信息,请在此处作为对此问题的答复,或者通过 stoub at microsoft dot com
私下发送电子邮件给我。
我非常期待您的来信。
提前致谢!
I hope this is not a misuse of stackoverflow; recently I've seen some great questions here on Parallel Extensions, and it got my interest piqued.
My question:
Are you using Parallel Extensions, and if so, how?
My name is Stephen Toub and I'm on the Parallel Computing Platform team at Microsoft. We're the group responsible for Parallel Extensions. I'm always interested in hearing about how developers are utilizing Parallel Extensions (e.g. Parallel.For, PLINQ, ConcurrentDictionary, etc.), positive experiences you've had, negative experiences you've had, feature requests for the future, and so on.
If you'd be willing to share such information, please do, either here as a response to this question or to me privately through email at stoub at microsoft dot com
.
I'm very much looking forward to hearing from you.
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我正在使用 TPL 进行嵌套的 Parallel.ForEach 调用。因为我通过这些调用访问字典,所以我必须使用 ConcurrentDictionary。虽然它很好,但我有一些问题:
ForEach 内部的委托没有做太多工作,所以我没有得到太多并行性。系统似乎大部分时间都花在连接线程上。如果有一种方法可以找出为什么它没有获得更好的并发性并对其进行改进,那就太好了。
内部
ForEach
迭代是在ConcurrentDictionary
实例上进行的,如果我没有添加枚举器缓存,这将导致系统将大量时间花费在字典的枚举器上.我的许多
ConcurrentDictionary
实例实际上都是集合,但没有ConcurrentSet
因此我必须使用ConcurrentDictionary
实现我自己的实例。< /p>ConcurrentDictionary
不支持对象初始化语法,所以我不能说var dict = new ConcurrentDictionary; { { 'A', 65 } };
这也意味着我无法将ConcurrentDictionary
文字分配给类成员。在某些地方,我必须在
ConcurrentDictionary
中查找键,并调用一个昂贵的函数来创建一个值(如果它不存在)。如果有一个采用addValueFactory
的GetOrAdd
重载,这样只有在键不存在时才可以计算该值,那就太好了。这可以使用.AddOrUpdate(key, addValueFactory, (k, v) => v)
进行模拟,但这会增加每次查找的额外委托调用的开销。I'm using the TPL for doing nested
Parallel.ForEach
calls. Because I access dictionaries from these calls I have to useConcurrentDictionary
. Although it's nice, I have a few issues:The delegates inside of
ForEach
don't do much work so I don't get much parallelism. The system seems to spend most of its time joining threads. It would be nice if there was a way to figure out why it's not getting better concurrency and improve it.The inner
ForEach
iterations are overConcurrentDictionary
instances, which would cause the system to spend much of its time enumerators for the dictionary if I didn't add an enumerator cache.Many of my
ConcurrentDictionary
instances are actually sets, but there is noConcurrentSet
so I had to implement my own with aConcurrentDictionary
.ConcurrentDictionary
does not support object initialization syntax so I can't sayvar dict = new ConcurrentDictionary<char, int> { { 'A', 65 } };
which also means I can't assignConcurrentDictionary
literals to class members.There are some places where I have to lookup a key in a
ConcurrentDictionary
and call an expensive function to create a value if it doesn't exist. It would be nice if there were an overload ofGetOrAdd
that takes anaddValueFactory
so that the value can be computed only if the key doesn't exist. This can be simulated with.AddOrUpdate(key, addValueFactory, (k, v) => v)
but that adds the overhead of an extra delegate call to every lookup.我还没有广泛使用它,但我肯定会密切关注它的用途,并在我们的代码库中寻找机会使用它(不幸的是,我们的许多项目仍然受 .NET-2.0 约束)暂且)。我自己想出的一个小宝石是一个独特的单词计数器。我认为这是我能想到的最快、最简洁的实现 - 如果有人可以做得更好,那就太棒了:
I haven't used it extensively yet, but I've definitely kept my ear to its uses and look for opportunities in our code base to put it to use (unfortunately, we're .NET-2.0 bound on many of our projects still for the time being). One little gem I came up with myself was a unique word counter. I think this is the fastest and most concise implementation I can come up with - if someone can make it better, that would be awesomeness:
我一直在我的项目 MetaSharp 上使用它。我有一个基于 MSBuild 的 DSL 编译管道,一个阶段类型是多对多阶段。 M:M 阶段使用.AsParallel.ForAll(...)。
这是片段:
I have been using it on my project MetaSharp. I have a MSBuild based compile pipeline for DSLs and one stage types is a Many to Many stage. The M:M stage uses .AsParallel.ForAll(...).
Here's the snippet:
我们并没有广泛使用它,但它确实派上了用场。
通过将一些更耗时的步骤包装在
Parallel.Invoke()
打电话。我也喜欢使用并行库来测试线程安全性。我发现并报告了 Ninject 的几个线程问题,代码如下:
在我们的实际生产代码中,我们使用一些并行扩展来运行一些集成操作,这些操作应该每隔几分钟运行一次,其中主要包括拉取来自网络服务的数据。由于网络连接固有的高延迟,这利用了并行性的特殊优势,并允许我们的作业在再次启动之前完成运行。
We don't use it extensively, but it has certainly come in handy.
I was able to reduce the running time of a few of our longer-running unit tests to about 1/3 their original time just by wrapping some of the more time-intensive steps in a
Parallel.Invoke()
call.I also love using the parallel libraries for testing thread-safety. I've caught and reported a couple of threading issues with Ninject with code something like this:
In our actual production code, we use some parallel extensions to run some integration actions that are supposed to run every few minutes, and which consist mostly of pulling data from web services. This takes special advantage of parallelism because of the high latency inherent in web connections, and allows our jobs to all finish running before they're supposed to fire again.
我正在使用一个存储超过 1 亿个项目的 ConcurrentDictionary。我的应用程序当时使用了大约 8 GB 内存。然后,ConcurrentDictionary 决定要在另一个 Add 上增长。当内存不足时,它显然想要增长很多(一些内部原始算法)。这是在具有 32GB 内存的 x64 上。
因此,我想要一个布尔值来阻止(并发)字典的自动重新增长/重新散列。然后,我将在创建时使用一组固定的存储桶初始化字典(这与固定容量不同!)。随着时间的推移,随着桶中的物品越来越多,它会变得有点慢。但这可以防止过快和不必要的重新哈希和内存溢出。
I am using a ConcurrentDictionary that stores 100 million+ items. My application uses around 8 GB of memory at that moment. The ConcurrentDictionary then decides it wants to grow on another Add. And it wants to grow a LOT apparently (some internal prima algorithm) as it runs out of memory. This is on x64 with 32GB of memory.
Therefore I would like a boolean to block automatic regrow/rehash of a (concurrent)dictionary. I would then Initialize the dictionary at creation with a fixed set of buckets (this is not the same as a fixed capacity!). And it would become a little slower over time as there are more and more items in a bucket. But this would prevent rehashing and getting out of memory too quickly and unnecessarily.