使用 Parallel.ForEachAsync 时如何确定 MaxDegreeOfParallelism 的适当值
Scott Hanselman 在他的博客上给出的使用 Parallel 的示例.NET 6 中的 .ForEachAsync
将 MaxDegreeOfParallelism 的值指定为 3。
但是,如果未指定,则使用默认值MaxDegreeOfParallelism
是ProcessorCount
。这对于 CPU 密集型工作有意义,但对于异步 I/O 密集型工作来说,默认值似乎是一个糟糕的选择。
如果我正在执行类似于下面 Scott 示例的操作,但我想尽快完成,我应该如何确定用于 MaxDegreeOfParallelism
的最佳值?将其指定为 int.MaxValue 并假设 TaskScheduler 在调度 ThreadPool 上的工作时会做最明智的事情是否合理?
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 3
};
await Parallel.ForEachAsync(userHandlers, parallelOptions, async (uri, token) =>
{
var user = await client.GetFromJsonAsync<GitHubUser>(uri, token);
Console.WriteLine($"Name: {user.Name}\nBio: {user.Bio}\n");
});
The example Scott Hanselman gives on his blog for using Parallel.ForEachAsync
in .NET 6 specifies the value of MaxDegreeOfParallelism
as 3.
However, if unspecified, the default MaxDegreeOfParallelism
is ProcessorCount
. This makes sense for CPU bound work, but for asynchronous I/O bound work, it seems like a poor choice for a default value.
If I'm doing something like in Scott's example below, but I want to do it as fast as possible, how should I determine the best value to use for MaxDegreeOfParallelism
? Is it reasonable to specify this as int.MaxValue
and just assume the TaskScheduler
will do the most sensible thing when it comes to scheduling the work on the ThreadPool?
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 3
};
await Parallel.ForEachAsync(userHandlers, parallelOptions, async (uri, token) =>
{
var user = await client.GetFromJsonAsync<GitHubUser>(uri, token);
Console.WriteLine(quot;Name: {user.Name}\nBio: {user.Bio}\n");
});
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
恕我直言,获得该数字的唯一方法是...测试。
对于HTTP工作,涉及两方:
对于远程侧而言,您的快速可能太快。这可能是因为资源和/或节流。
注意默认值
默认值(导致加工汇率)将取决于代码在云上运行的计算机,如果您在云中运行代码,则该数字可能与您使用的笔记本电脑上的计算机不同。
这可能会导致非生产和生产环境之间的意外差异。
github特定的
github.com每小时为非企业用户提供5,000个请求(来自在这里),还有 this :
在集成师的最佳实践我们可以阅读
IMHO The only way to get the number is...testing.
For http work there are two parties involved:
Your fast may be too fast for the remote side. This can because of resources and/or throttling.
Note on the default
The default - which results in ProcessorCount - will depend on the machine that the code runs on and if you run your code in the cloud this number may be different than what's on your beefy laptop.
This can lead to unexpected differences between non-prod and prod environments.
GitHub specific
gitHub.com has a 5,000 requests per hour for non-enterprise users (from here) and there is also this:
In Best practices for integrators we can read