如何提高基于 MQ 的批处理应用程序的性能?
我有一个应用程序,消息以每小时 70K XML 的速度不断传入。我们使用这些 XML 消息并将其存储到中间队列中。创建中间队列是因为我们需要满足 24 小时内消费完所有消息的 SLA。我们能够在 24 小时内使用 XMLS 并将其加载到内部队列中。将其加载到内部队列后,我们处理 XMLS(解析、应用很少的转换、执行很少的验证)并将数据存储到高度规范化的数据模型中。我知道数据模型会对性能产生巨大影响,不幸的是,我们无法控制数据模型。目前,我们需要 3.5 分钟来处理 2K 消息,这是不可接受的。我们希望将 2K 条消息的处理时间缩短至 1 分钟。以下是我们迄今为止所做的工作:
1) 在适用的情况下应用索引。
2)使用XMLBeans解析XML(每个XML的大小不是很大)
3) 删除了所有不必要的验证、转换等。
应用程序运行于:
操作系统:RHEL 5.4 64位
平台:JDK 1.6.0_17,64位
数据库:Oracle 11g R2 64位(2节点集群)
外部 MQ:IBM 队列
内部临时存储MQ:JBoss MQ
应用服务器:Jboss 5.1.0.GA(EAP版本)
我们消费和处理XML消息的顺序非常重要,因此我们无法进行并行处理。
我们还能做些什么来提高性能吗?
I have an application where messages keep coming at a rate of 70K XMLs per hour. We consume these XML messages and store it into an intermediate queue. The intermediate queue is created because we need to meet SLA of consuming all the messages with 24 hours. We are able to consume and load the XMLS into the internal queue within 24 hours. After loading it to internal queue, we process the XMLS (parse, apply very few transformation, perform very few validations) and store the data to a heavily normalized data model. I know that the datamodel can have a huge impact on performance, unfortunately, we have no control over the datamodel. Currently, we take 3.5 minutes to process 2K messages, which is unacceptable. We want to bring it down to 1 minute for 2K messages. Here is what we have done so far:
1) Applied indexes wherever applicable.
2) Use XMLBeans for parsing the XMLs (size of each XML is not very huge)
3) Removed all unnecessary validations, transformatios, etc.
The application runs on:
Operating system: RHEL 5.4 64 bit
Platform: JDK 1.6.0_17, 64 bit
Database: Oracle 11g R2 64 bit (2 node cluster)
External MQ: IBM Queue
Internal temporary storage MQ: JBoss MQ
Application Server: Jboss 5.1.0.GA (EAP Version)
The order in which we consume and process the XML messages is very important and so we cannot do a parallel processing.
Is there anything else we can do to improve performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
消息传递调整之外的一些建议,因为这似乎不是您的[主要]瓶颈:
关于消息传递的一个快速项目......
您没有提到您是否在 WebSphere MQ 中使用消息驱动 bean,但如果您是,则 入站配置 称为pollingInterval引用文档,这意味着:
如果会话中的每个消息侦听器在其队列中没有合适的消息,则这是每个消息侦听器再次尝试从中获取消息之前经过的最大间隔(以毫秒为单位)它的队列。如果经常发生没有合适的消息可用于会话中的任何消息侦听器的情况,请考虑增加此属性的值。仅当 TRANSPORT 的值为 BIND 或 CLIENT 时,此属性才相关。
默认的 pollingTime 为 5000 毫秒。您当前的消息处理时间是
= 每条消息 105 毫秒。
如果您时不时地引入 5000 毫秒的暂停,这将严重降低您的吞吐量,因此您可能需要通过测量消息排队时间和您在队列中接收消息的时间之间的持续差异来研究这一问题。 JBoss 消息监听器。排队时间可以从这些消息头中确定:
总而言之,您最好的选择是弄清楚如何并行化。
祝你好运。
//尼古拉斯
Some suggestions outside of message delivery tuning since it appears this is not your [primary] bottleneck:
One quick item on messaging.....
You did not mention if you were using message driven beans with WebSphere MQ, but if you are, there is a setting in the Inbound Configuration called pollingInterval which, to quote from the docs, means:
If each message listener within a session has no suitable message on its queue, this is the maximum interval, in milliseconds, that elapses before each message listener tries again to get a message from its queue. If it frequently happens that no suitable message is available for any of the message listeners in a session, consider increasing the value of this property. This property is relevant only if TRANSPORT has the value BIND or CLIENT.
The default pollingTime is 5000 ms. Your current message processing time is
= 105 ms per message.
If you introduce a 5000 ms pause here-and-there, that will seriously cut down on your throughput, so you might want to look into this by measuring the ongoing difference between the message enqueue time and the time that you receive the message in your JBoss message listener. The enqueue time can be determined from these message headers:
All in all, your best bet is going to be to figure out how to parallelize.
Good luck.
//Nicholas
WebSphere MQ,即使在小型服务器上,卸载消息的速度也比您描述的速度快得多。 WMQ V7 的 Windows 性能报告以每秒超过 2,200 次 2k 持续往返(一次请求和一次回复)进行测试通过客户渠道。也就是说每秒超过 4,000 条消息。
您的情况的瓶颈似乎是处理消息的延迟以及对按特定顺序处理消息的依赖性。可以为您带来最大性能提升的选项是消除顺序依赖性。当我在银行工作时,我们有一个系统,可以按照交易到达的确切顺序发布交易,每个人都说这个要求是强制性的。然而,我们最终修改了系统,在白天执行备忘录发布,然后在稍后的步骤中重新发布。备忘录发布以任何顺序发生,并支持并行性、故障转移和多实例处理的所有其他好处。一旦事务全部进入数据库,最后一篇文章就按逻辑顺序(实际上是对客户最有利的顺序)应用事务。序列依赖性将您锁定在单例模型中,并且是异步消息传递的最坏情况要求。如果可能的话,消除它们。
另一个需要改进的领域是消息的解析和处理。只要您坚持序列依赖性,这就是提高性能的最佳选择。
最后,您始终可以选择以更多内存、CPU、更快的磁盘 I/O 等形式投入资金来解决问题。从本质上讲,这是用马力解决软件架构问题,永远不是最好的解决方案,但通常它可以为您赢得足够的时间来解决根本原因。
WebSphere MQ, even on a small server, can unload messages MUCH faster than the rate you describe. The Windows Performance Report for WMQ V7 tested at more than 2,200 2k persistent round trips (one request and one reply) per second over client channels. That's more than 4,000 messages per second.
The bottleneck in your case would seem to be the latency of processing messages and the dependency on processing the messages in a particular order. The option that could give you the MOST performance boost would be to eliminate the order dependency. When I worked at a bank we had a system that posted transactions in the exact order they arrived and everyone said this requirement was mandatory. However, we eventually revised the system to perform a memo-post during the day and then repost in a later step. The memo-posting occurred in any order and supported parallelism, failover and all the other benefits of multi-instance processing. The final post applied the transactions in logical order (and in fact in an order that was most favorable to the customer) once they were all in the DB. Sequence dependencies lock you into a singleton model and are a worst-case requirement for asynch messaging. Eliminate them if at all possible.
The other area for improvement will be in the parsing and processing of the messages. As long as you are stuck with sequence dependencies, this is the best bet for improving performance.
Finally, you always have the option to throw money at the problem in the form of more memory, CPU, faster disk I/O and so forth. Essentially this is addressing software architecture with horsepower and is never the best solution but often it buys you enough time to address the root cause.