使用 ThreadPool.GetAvailableThreads 来限制服务执行的工作量是否可以接受?
我有一项服务可以非常快速地轮询队列以检查是否有更多需要完成的“工作”。队列中的工作总是多于单个工作人员可以处理的数量。我想确保当服务已经达到最大容量时,单个工作人员不会抢占太多工作。
假设我的工作线程每 N(ms) 从队列中获取 10 条消息,并使用并行库在不同线程上并行处理每条消息。工作本身IO量很大。许多 SQL Server 查询甚至 Azure 表存储(http 请求)都是针对单个工作单元进行的。
使用 TheadPool.GetAvailableThreads() 是否是限制服务可以获取的工作量的正确方法?
我发现我可以访问可用的 WorkerThreads 和 CompletionPortThreads。对于IO重的进程,是不是看有多少个CompletionPortThreads可用比较合适?我相信 1000 是每个进程可用的数量,无论 CPU 数量如何。
更新 - 了解我正在使用的队列是 Azure 队列可能很重要。因此,每个检查消息的请求都是作为异步 http 请求发出的,该请求会返回接下来的 10 条消息。 (并且要花钱)
I have a service which polls a queue very quickly to check for more 'work' which needs to be done. There is always more more work in the queue than a single worker can handle. I want to make sure a single worker doesn't grab too much work when the service is already at max capacity.
Let say my worker grabs 10 messages from the queue every N(ms) and uses the Parallel Library to process each message in parallel on different threads. The work itself is very IO heavy. Many SQL Server queries and even Azure Table storage (http requests) are made for a single unit of work.
Is using the TheadPool.GetAvailableThreads() the proper way to throttle how much work the service is allowed to grab?
I see that I have access to available WorkerThreads and CompletionPortThreads. For an IO heavy process, is it more appropriate to look at how many CompletionPortThreads are available? I believe 1000 is the number made available per process regardless of cpu count.
Update - Might be important to know that the queue I'm working with is an Azure Queue. So, each request to check for messages is made as an async http request which returns with the next 10 messages. (and costs money)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不认为使用 IO 完成端口是计算抓取多少的好方法。
我认为理想的情况是当下一组到达时您就用完了工作,因此您的积压工作永远不会超出您可以合理处理的范围。
为什么不跟踪处理一个作业需要多长时间以及获取作业需要多长时间,并根据这些情况调整每次获取的工作量,并使用合适的最小/最大值来阻止事情变得疯狂(如果您有一个)真正便宜或真正昂贵的工作很少吗?
您还需要计算出合理的最佳并行度 - 我不清楚它是否真的是 IO 重的,还是只是“异步请求重的”,即您花费了大量的时间只是等待对复杂查询的响应,这本身对于您的服务资源来说是廉价的。
I don't think using IO completion ports is a good way to work out how much to grab.
I assume that the ideal situation is where you run out of work just as the next set arrives, so you've never got more backlog than you can reasonably handle.
Why not keep track of how long it takes to process a job and how long it takes to fetch jobs, and adjust the amount of work fetched each time based on that, with suitable minimum/maximum values to stop things going crazy if you have a few really cheap or really expensive jobs?
You'll also want to work out a reasonable optimum degree of parallelization - it's not clear to me whether it's really IO-heavy, or whether it's just "asynchronous request heavy", i.e. you spend a lot of time just waiting for the responses to complicated queries which in themselves are cheap for the resources of your service.
我一直在相同的环境中解决几乎相同的问题。我最终为每个 WorkerRole 提供了一个内部工作队列,作为 BlockingCollection<> 实现。有一个线程监视该队列 - 当项目数量变少时,它会从 Azure 队列请求更多项目。它始终要求最大项目数 32,以降低成本。如果队列为空,它还会自动退避。
然后我有一组我自己启动的工作线程。它们坐在一个循环中,从内部工作队列中拉出项目。工作线程的数量是我优化负载的主要方法,因此我将其设置为 .cscfg 文件中的选项。我目前每个工作线程运行 35 个线程,但该数字取决于您的情况。
我尝试使用 TPL 来管理工作,但我发现管理负载更加困难。有时,TPL 会并行化不足,机器会感到无聊,有时会过度并行化,当项目仍在工作时,Azure 队列消息可见性会过期。
这可能不是最佳解决方案,但它似乎对我来说工作正常。
I've been working virtually the same problem in the same environment. I ended up giving each WorkerRole an internal work queue, implemented as a BlockingCollection<>. There's a single thread that monitors that queue - when the number of items gets low it requests more items from the Azure queue. It always requests the maximum number of items, 32, to cut down costs. It also has automatic backoff in the event that the queue is empty.
Then I have a set of worker threads that I started myself. They sit in a loop, pulling items off the internal work queue. The number of worker threads is my main way to optimize the load, so I've got that set up as an option in the .cscfg file. I'm currently running 35 threads/worker, but that number will depend on your situation.
I tried using TPL to manage the work, but I found it more difficult to manage the load. Sometimes TPL would under-parallelize and the machine would be bored, other times it would over-parallelize and the Azure queue message visibility would expire while the item was still being worked.
This may not be the optimal solution, but it seems to be working OK for me.
我决定保留一个内部计数器来记录当前正在处理的消息数量。我使用 Interlocked.Increment/Decrement 以线程安全的方式管理计数器。
我本来会使用 Semaphore 类,因为每个消息都绑定到它自己的线程,但由于队列轮询器和生成线程的代码的异步性质而无法使用。
I decided to keep an internal counter of how many message are currently being processed. I used Interlocked.Increment/Decrement to manage the counter in a thread-safe manner.
I would have used the Semaphore class since each message is tied to its own Thread but wasn't able to due to the async nature of the queue poller and the code which spawned the threads.