您使用并行扩展吗?

发布于 2024-09-27 23:08:39 字数 374 浏览 5 评论 0原文

我希望这不是对 stackoverflow 的滥用;最近我在这里看到了一些关于并行扩展的好问题,这引起了我的兴趣。

我的问题: 您是否使用并行扩展?如果使用,如何使用?

我叫 Stephen Toub,是 Microsoft 并行计算平台团队的成员。我们是负责并行扩展的小组。我总是有兴趣了解开发人员如何利用并行扩展(例如 Parallel.For、PLINQ、ConcurrentDictionary 等)、您的积极经验、您的消极经验、对未来的功能请求等等上。
如果您愿意分享此类信息,请在此处作为对此问题的答复,或者通过 stoub at microsoft dot com 私下发送电子邮件给我。

我非常期待您的来信。

提前致谢!

I hope this is not a misuse of stackoverflow; recently I've seen some great questions here on Parallel Extensions, and it got my interest piqued.

My question:
Are you using Parallel Extensions, and if so, how?

My name is Stephen Toub and I'm on the Parallel Computing Platform team at Microsoft. We're the group responsible for Parallel Extensions. I'm always interested in hearing about how developers are utilizing Parallel Extensions (e.g. Parallel.For, PLINQ, ConcurrentDictionary, etc.), positive experiences you've had, negative experiences you've had, feature requests for the future, and so on.
If you'd be willing to share such information, please do, either here as a response to this question or to me privately through email at stoub at microsoft dot com.

I'm very much looking forward to hearing from you.

Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

绝不放开 2024-10-04 23:08:39

我正在使用 TPL 进行嵌套的 Parallel.ForEach 调用。因为我通过这些调用访问字典,所以我必须使用 ConcurrentDictionary。虽然它很好,但我有一些问题:

  • ForEach 内部的委托没有做太多工作,所以我没有得到太多并行性。系统似乎大部分时间都花在连接线程上。如果有一种方法可以找出为什么它没有获得更好的并发性并对其进行改进,那就太好了。

  • 内部 ForEach 迭代是在 ConcurrentDictionary 实例上进行的,如果我没有添加枚举器缓存,这将导致系统将大量时间花费在字典的枚举器上.

  • 我的许多 ConcurrentDictionary 实例实际上都是集合,但没有 ConcurrentSet 因此我必须使用 ConcurrentDictionary 实现我自己的实例。< /p>

  • ConcurrentDictionary 不支持对象初始化语法,所以我不能说 var dict = new ConcurrentDictionary; { { 'A', 65 } }; 这也意味着我无法将 ConcurrentDictionary 文字分配给类成员。

  • 在某些地方,我必须在 ConcurrentDictionary 中查找键,并调用一个昂贵的函数来创建一个值(如果它不存在)。如果有一个采用 addValueFactoryGetOrAdd 重载,这样只有在键不存在时才可以计算该值,那就太好了。这可以使用 .AddOrUpdate(key, addValueFactory, (k, v) => v) 进行模拟,但这会增加每次查找的额外委托调用的开销。

I'm using the TPL for doing nested Parallel.ForEach calls. Because I access dictionaries from these calls I have to use ConcurrentDictionary. Although it's nice, I have a few issues:

  • The delegates inside of ForEach don't do much work so I don't get much parallelism. The system seems to spend most of its time joining threads. It would be nice if there was a way to figure out why it's not getting better concurrency and improve it.

  • The inner ForEach iterations are over ConcurrentDictionary instances, which would cause the system to spend much of its time enumerators for the dictionary if I didn't add an enumerator cache.

  • Many of my ConcurrentDictionary instances are actually sets, but there is no ConcurrentSet so I had to implement my own with a ConcurrentDictionary.

  • ConcurrentDictionary does not support object initialization syntax so I can't say var dict = new ConcurrentDictionary<char, int> { { 'A', 65 } }; which also means I can't assign ConcurrentDictionary literals to class members.

  • There are some places where I have to lookup a key in a ConcurrentDictionary and call an expensive function to create a value if it doesn't exist. It would be nice if there were an overload of GetOrAdd that takes an addValueFactory so that the value can be computed only if the key doesn't exist. This can be simulated with .AddOrUpdate(key, addValueFactory, (k, v) => v) but that adds the overhead of an extra delegate call to every lookup.

傾旎 2024-10-04 23:08:39

我还没有广泛使用它,但我肯定会密切关注它的用途,并在我们的代码库中寻找机会使用它(不幸的是,我们的许多项目仍然受 .NET-2.0 约束)暂且)。我自己想出的一个小宝石是一个独特的单词计数器。我认为这是我能想到的最快、最简洁的实现 - 如果有人可以做得更好,那就太棒了:

private static readonly char[] delimiters = { ' ', '.', ',', ';', '\'', '-', ':', '!', '?', '(', ')', '<', '>', '=', '*', '/', '[', ']', '{', '}', '\\', '"', '\r', '\n' };
private static readonly Func<string, string> theWord = Word;
private static readonly Func<IGrouping<string, string>, KeyValuePair<string, int>> theNewWordCount = NewWordCount;
private static readonly Func<KeyValuePair<string, int>, int> theCount = Count;

private static void Main(string[] args)
{
    foreach (var wordCount in File.ReadAllText(args.Length > 0 ? args[0] : @"C:\DEV\CountUniqueWords\CountUniqueWords\Program.cs")
        .Split(delimiters, StringSplitOptions.RemoveEmptyEntries)
        .AsParallel()
        .GroupBy(theWord, StringComparer.OrdinalIgnoreCase)
        .Select(theNewWordCount)
        .OrderByDescending(theCount))
    {
        Console.WriteLine(
            "Word: \""
            + wordCount.Key
            + "\" Count: "
            + wordCount.Value);
    }

    Console.ReadLine();
}

private static string Word(string word)
{
    return word;
}

private static KeyValuePair<string, int> NewWordCount(IGrouping<string, string> wordCount)
{
    return new KeyValuePair<string, int>(
        wordCount.Key,
        wordCount.Count());
}

private static int Count(KeyValuePair<string, int> wordCount)
{
    return wordCount.Value;
}

I haven't used it extensively yet, but I've definitely kept my ear to its uses and look for opportunities in our code base to put it to use (unfortunately, we're .NET-2.0 bound on many of our projects still for the time being). One little gem I came up with myself was a unique word counter. I think this is the fastest and most concise implementation I can come up with - if someone can make it better, that would be awesomeness:

private static readonly char[] delimiters = { ' ', '.', ',', ';', '\'', '-', ':', '!', '?', '(', ')', '<', '>', '=', '*', '/', '[', ']', '{', '}', '\\', '"', '\r', '\n' };
private static readonly Func<string, string> theWord = Word;
private static readonly Func<IGrouping<string, string>, KeyValuePair<string, int>> theNewWordCount = NewWordCount;
private static readonly Func<KeyValuePair<string, int>, int> theCount = Count;

private static void Main(string[] args)
{
    foreach (var wordCount in File.ReadAllText(args.Length > 0 ? args[0] : @"C:\DEV\CountUniqueWords\CountUniqueWords\Program.cs")
        .Split(delimiters, StringSplitOptions.RemoveEmptyEntries)
        .AsParallel()
        .GroupBy(theWord, StringComparer.OrdinalIgnoreCase)
        .Select(theNewWordCount)
        .OrderByDescending(theCount))
    {
        Console.WriteLine(
            "Word: \""
            + wordCount.Key
            + "\" Count: "
            + wordCount.Value);
    }

    Console.ReadLine();
}

private static string Word(string word)
{
    return word;
}

private static KeyValuePair<string, int> NewWordCount(IGrouping<string, string> wordCount)
{
    return new KeyValuePair<string, int>(
        wordCount.Key,
        wordCount.Count());
}

private static int Count(KeyValuePair<string, int> wordCount)
{
    return wordCount.Value;
}
如果没结果 2024-10-04 23:08:39

我一直在我的项目 MetaSharp 上使用它。我有一个基于 MSBuild 的 DSL 编译管道,一个阶段类型是多对多阶段。 M:M 阶段使用.AsParallel.ForAll(...)。

这是片段

protected sealed override IEnumerable<IContext> Process()
{
    if (this.Input.Count() > 1)
    {
        this.Input
            .AsParallel<IContext>()
            .ForAll(this.Process);
    }
    else if (this.Input.Any())
    {
        this.Process(this.Input.Single());
    }

    return this.Input.ToArray();
}

I have been using it on my project MetaSharp. I have a MSBuild based compile pipeline for DSLs and one stage types is a Many to Many stage. The M:M stage uses .AsParallel.ForAll(...).

Here's the snippet:

protected sealed override IEnumerable<IContext> Process()
{
    if (this.Input.Count() > 1)
    {
        this.Input
            .AsParallel<IContext>()
            .ForAll(this.Process);
    }
    else if (this.Input.Any())
    {
        this.Process(this.Input.Single());
    }

    return this.Input.ToArray();
}
失眠症患者 2024-10-04 23:08:39

我们并没有广泛使用它,但它确实派上了用场。

通过将一些更耗时的步骤包装在 Parallel.Invoke() 打电话。

我也喜欢使用并行库来测试线程安全性。我发现并报告了 Ninject 的几个线程问题,代码如下:

var repositoryTypes = from a in CoreAssemblies
                    from t in a.GetTypes()
                    where t.Name.EndsWith("Repository")
                    select t;
repositoryTypes.ToList().AsParallel().ForAll(
    repositoryType => _kernel.Get(repositoryType));

在我们的实际生产代码中,我们使用一些并行扩展来运行一些集成操作,这些操作应该每隔几分钟运行一次,其中主要包括拉取来自网络服务的数据。由于网络连接固有的高延迟,这利用了并行性的特殊优势,并允许我们的作业在再次启动之前完成运行。

We don't use it extensively, but it has certainly come in handy.

I was able to reduce the running time of a few of our longer-running unit tests to about 1/3 their original time just by wrapping some of the more time-intensive steps in a Parallel.Invoke() call.

I also love using the parallel libraries for testing thread-safety. I've caught and reported a couple of threading issues with Ninject with code something like this:

var repositoryTypes = from a in CoreAssemblies
                    from t in a.GetTypes()
                    where t.Name.EndsWith("Repository")
                    select t;
repositoryTypes.ToList().AsParallel().ForAll(
    repositoryType => _kernel.Get(repositoryType));

In our actual production code, we use some parallel extensions to run some integration actions that are supposed to run every few minutes, and which consist mostly of pulling data from web services. This takes special advantage of parallelism because of the high latency inherent in web connections, and allows our jobs to all finish running before they're supposed to fire again.

小女人ら 2024-10-04 23:08:39

我正在使用一个存储超过 1 亿个项目的 ConcurrentDictionary。我的应用程序当时使用了大约 8 GB 内存。然后,ConcurrentDictionary 决定要在另一个 Add 上增长。当内存不足时,它显然想要增长很多(一些内部原始算法)。这是在具有 32GB 内存的 x64 上。

因此,我想要一个布尔值来阻止(并发)字典的自动重新增长/重新散列。然后,我将在创建时使用一组固定的存储桶初始化字典(这与固定容量不同!)。随着时间的推移,随着桶中的物品越来越多,它会变得有点慢。但这可以防止过快和不必要的重新哈希和内存溢出。

I am using a ConcurrentDictionary that stores 100 million+ items. My application uses around 8 GB of memory at that moment. The ConcurrentDictionary then decides it wants to grow on another Add. And it wants to grow a LOT apparently (some internal prima algorithm) as it runs out of memory. This is on x64 with 32GB of memory.

Therefore I would like a boolean to block automatic regrow/rehash of a (concurrent)dictionary. I would then Initialize the dictionary at creation with a fixed set of buckets (this is not the same as a fixed capacity!). And it would become a little slower over time as there are more and more items in a bucket. But this would prevent rehashing and getting out of memory too quickly and unnecessarily.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文