架构 Azure 辅助角色以处理来自约 10 个队列的数据的最佳方法
我有一个辅助角色,它将数据放入大约 10 个需要处理的队列中。有大量数据 - 可能每秒大约有 10-100 条消息在各个队列中排队。
队列保存不同的数据并单独处理它们。特别是有一个非常活跃的队列。
按照我现在的设置方式,我有一个单独的辅助角色,它生成 10 个不同的线程,每个线程执行一个具有 while(true){从队列中获取消息并处理它} 的方法。每当队列中的数据得到备份时,我们只需启动更多此类进程即可帮助加快队列中数据的处理速度。另外,由于一个队列更活跃,因此我实际上启动了许多指向同一方法的线程来处理该队列中的数据。
但是,我发现部署的 CPU 利用率很高。几乎始终保持或接近 100%。
我想知道这是否是因为线程饥饿?或者因为访问队列是 RESTful 的,并且线程最终会通过连接并减慢速度而相互阻塞?或者,是因为我使用:
while(true)
{
var message = get message from queue;
if(message != null)
{
//process message
}
}
执行得太快了?
消息的每次处理也会将其保存到 Azure 表存储或数据库中 - 因此保存此数据的过程可能会消耗 CPU。
实际上,调试高 CPU 负载确实非常困难。所以,我的问题是:是否可以进行一般架构更改,以帮助缓解和防止可能出现的任何可能的问题? (例如,不要使用 while(true) 使用不同类型的轮询 - 尽管我认为该示例最终是相同的)。
也许简单地使用 new Thread() 生成新线程并不是最好的方法。
I have one worker role that throws data into around 10 queues that need to be processed. There is a lot of data - probably around 10-100 messages a second that gets queued up in various queues.
The queues hold different data and process them separately. There is a single queue in particular that is very active.
The way I have it setup now, I a separate worker role that spawns 10 different threads, each thread executes a method that has a while(true){get message from queue and process it}. Whenever data in the queue gets backed up we simply launch more of these processes to help speed up the processing of the data from the queue. Also, since one queue is more active, I actually launch a number of threads pointing at the same method to process data from that queue.
However, I am seeing high CPU utilization of the deployment. Almost at or near 100% constantly.
I am wondering if this is because of thread starvation? Or because accessing the queue is RESTful and the threads end up blocking each other via doing the connection and slowing things down? Or, is it because I use:
while(true)
{
var message = get message from queue;
if(message != null)
{
//process message
}
}
And that gets executed too fast?
Every processing of the message also saves it to the Azure Table Storage or the DB - so it might be the process of saving this data that is eating up the CPU.
In effect, it's been really hard to debug the high CPU load. So, my question is: are there general architecture changes that I can make that will help alleviate + prevent any possible issue that there might be? (e.g. instead of using while(true) using a different type of polling - although I'd imagine it's the same in the end for that example).
Maybe simply spawning new threads using new Thread() is not the best way to go.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我建议在循环中添加一个 sleep 语句...不仅这个紧密的循环可能会占用 CPU 资源,而且您还需要为存储事务付费。每查一万次队列就要花一分钱。这是一个很小的成本,但随着时间的推移,它可能会变得很大。
我也经常使用这样的代码:
while(true)
{
var msg = q1.GetMessage();
if (msg != null) { ... }
消息 = q2.GetMessage();
if (msg != null) { ... }
换句话说
,串行轮询队列而不是并行轮询(这完全应该是一个词)。这样,您实际上一次只做一件事(如果您的任务是 CPU 密集型的,则很有用),但您仍然会检查每个循环中的所有队列。
I would suggest putting a sleep statement in your loop... not only is that tight loop probably hogging CPU resources, but you also pay for storage transactions. Every ten thousand times you check the queue, it costs a penny. That's a small cost, but it could add up over time to be significant.
I've also often used code like this:
while(true)
{
var msg = q1.GetMessage();
if (msg != null) { ... }
msg = q2.GetMessage();
if (msg != null) { ... }
}
In other words, poll the queues serially instead of parallelly (that should totally be a word). That way you're only actually doing one thing at a time (useful if your tasks are CPU-instensive), but you're still checking all the queues in each loop.
CPU也有同样的问题。这可能是由于 Azure 队列的本地实现效率低下造成的。
最后,我添加了指数睡眠策略(用于实施 - 请查看 Lokad.CQRS for Azure< /a> 项目),其中队列被频繁轮询,但如果任一队列中都没有消息,我们将逐渐开始增加睡眠间隔,直到达到某个上限。如果发现消息 - 我们会立即取消间隔。
这样,总体而言,系统不会浪费存储事务(和本地开发 CPU),但如果连续出现多条消息,系统仍能保持极高的响应速度。
Had the same problem with CPU. It could be caused by non-efficient local implementation of the Azure Queues.
In the end I added exponential sleep policy (for implementation - check out in the Lokad.CQRS for Azure project), where queues are polled frequently, but if there are no messages in either one, we gradually start increasing the sleep interval till it reaches some upper boundary. If the message is discovered - we drop the interval immediately.
This way on the overall the system does not waste the storage transactions (and local dev CPU), but stays extremely responsive, if multiple messages come in a row.
请观看 Brian Hitney 制作的缩减 Azure 角色视频。基本方法是生成一定数量的线程,每个线程都有一个“工作人员”,然后监视给定的队列并采取适当的行动。特别是这可以防止一个队列阻塞其他队列......
Check out Scaling Down Azure Roles video by Brian Hitney. The basic approach is to spawn some number of threads, each with a "worker" than monitors a given queue and acts appropriately. In particular this keeps one queue from blocking the others....
我认为你的问题来自循环实现。轮询必须通过 sleep() 之类的方法减慢。否则,没有什么可以阻止循环消耗 100% CPU 核心(这实际上是正常行为)。
I think your problem comes from the loop implementation. The polling must be slowed down by something like a sleep(). Otherwise, nothing will prevent the loop to consume 100% CPU Core (which is the normal behavior in fact).
有一篇很棒的 MSDN 文章涵盖了所有这些
MSDN - 最大化的最佳实践Windows Azure 上基于队列的消息传递解决方案的可扩展性和成本效益
它讨论了在有工作要做时添加线程和实例,并在没有工作时后退,这样您就不会连续且不必要地轮询队列多线程和实例,增加交易成本,并将 CPU 变成一个持续 100% CPU 利用率的加热器。
There is a great MSDN article that covers all of this
MSDN - Best Practices for Maximizing Scalability and Cost Effectiveness of Queue-Based Messaging Solutions on Windows Azure
It talks about adding threads and instances when there is work to do - and backing off when there isn't so you're not continuously and needlessly polling queues from mutliple threads and instances, racking up transaction costs and turning a CPU into a heater with constant 100% CPU utilisation.