使用 Parallel.ForEachAsync 时如何确定 MaxDegreeOfParallelism 的适当值

发布于 2025-01-17 23:02:16 字数 852 浏览 1 评论 0原文

Scott Hanselman 在他的博客上给出的使用 Parallel 的示例.NET 6 中的 .ForEachAsync 将 MaxDegreeOfParallelism 的值指定为 3。

但是,如果未指定,则使用默认值MaxDegreeOfParallelismProcessorCount。这对于 CPU 密集型工作有意义,但对于异步 I/O 密集型工作来说,默认值似乎是一个糟糕的选择。

如果我正在执行类似于下面 Scott 示例的操作,但我想尽快完成,我应该如何确定用于 MaxDegreeOfParallelism 的最佳值?将其指定为 int.MaxValue 并假设 TaskScheduler 在调度 ThreadPool 上的工作时会做最明智的事情是否合理?

ParallelOptions parallelOptions = new()
{
    MaxDegreeOfParallelism = 3
};
 
await Parallel.ForEachAsync(userHandlers, parallelOptions, async (uri, token) =>
{
    var user = await client.GetFromJsonAsync<GitHubUser>(uri, token);
 
    Console.WriteLine($"Name: {user.Name}\nBio: {user.Bio}\n");
});

The example Scott Hanselman gives on his blog for using Parallel.ForEachAsync in .NET 6 specifies the value of MaxDegreeOfParallelism as 3.

However, if unspecified, the default MaxDegreeOfParallelism is ProcessorCount. This makes sense for CPU bound work, but for asynchronous I/O bound work, it seems like a poor choice for a default value.

If I'm doing something like in Scott's example below, but I want to do it as fast as possible, how should I determine the best value to use for MaxDegreeOfParallelism? Is it reasonable to specify this as int.MaxValue and just assume the TaskScheduler will do the most sensible thing when it comes to scheduling the work on the ThreadPool?

ParallelOptions parallelOptions = new()
{
    MaxDegreeOfParallelism = 3
};
 
await Parallel.ForEachAsync(userHandlers, parallelOptions, async (uri, token) =>
{
    var user = await client.GetFromJsonAsync<GitHubUser>(uri, token);
 
    Console.WriteLine(
quot;Name: {user.Name}\nBio: {user.Bio}\n");
});

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

丶视觉 2025-01-24 23:02:16

恕我直言,获得该数字的唯一方法是...测试。

对于HTTP工作,涉及两方:

  1. 为您的工作编码远程侧面编码。

对于远程侧而言,您的快速可能太快。这可能是因为资源和/或节流。

注意默认值

默认值(导致加工汇率)将取决于代码在云上运行的计算机,如果您在云中运行代码,则该数字可能与您使用的笔记本电脑上的计算机不同。

这可能会导致非生产和生产环境之间的意外差异。

github特定的

github.com每小时为非企业用户提供5,000个请求(来自在这里),还有 this

为了在GitHub上提供优质的服务,使用API​​时可能适用于某些操作。例如,使用API​​快速创建内容,积极进行轮询而不是使用Webhooks,提出多个并发请求,或反复请求计算上昂贵的数据可能会导致次要率限制。

集成师的最佳实践我们可以阅读

处理次要率限制

二级利率限制是我们确保API的可用性的另一种方式。 To avoid hitting this limit, you should ensure your application follows the guidelines below.

  • ...
  • 向单个用户或客户端ID串行提出请求。请勿同时向单个用户或客户端ID提出请求。

IMHO The only way to get the number is...testing.

For http work there are two parties involved:

  1. you code
  2. the remote side that does the work for you.

Your fast may be too fast for the remote side. This can because of resources and/or throttling.

Note on the default

The default - which results in ProcessorCount - will depend on the machine that the code runs on and if you run your code in the cloud this number may be different than what's on your beefy laptop.

This can lead to unexpected differences between non-prod and prod environments.

GitHub specific

gitHub.com has a 5,000 requests per hour for non-enterprise users (from here) and there is also this:

In order to provide quality service on GitHub, additional rate limits may apply to some actions when using the API. For example, using the API to rapidly create content, poll aggressively instead of using webhooks, make multiple concurrent requests, or repeatedly request data that is computationally expensive may result in secondary rate limiting.

In Best practices for integrators we can read

Dealing with secondary rate limits

Secondary rate limits are another way we ensure the API's availability. To avoid hitting this limit, you should ensure your application follows the guidelines below.

  • ...
  • Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文