每秒能够处理 40,000 条消息的系统的模式和技术

发布于 2024-07-20 19:52:14 字数 216 浏览 8 评论 0原文

我们需要构建一个每秒能够处理 40,000 条消息的系统。 如果发生任何软件或硬件故障,消息不会丢失。

每条消息大小约为2-4Kb。

消息的处理包括验证消息、进行一些简单的算术计算、将结果保存到数据库以及(有时)向其他系统发送通知。

优选的软件技术是.Net。

哪些软件和硬件模式最适合此类任务?

需要多少硬件?

We need to build a system capable of processing 40,000 messages per second.
No messages can be lost in case of any software or hardware failures.

Each message size is about 2-4Kb.

Processing of a message consists of validating the message, doing some simple arithmetical calculations, saving result to database and (sometimes) sending notifications to other systems.

Preferable software technology is .Net.

What software and hardware patterns are the most suitable for such task?

How much hardware will it require?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

囚我心虐我身 2024-07-27 19:52:14
  1. 消息排队。 您的流程听起来像是它的主要目标。
  2. 集群/负载平衡。
  3. 简化您的代码

我要做的第一件事就是对通知进行排队。 然后我将对不需要返回值的所有数据库写入进行排队。 然后我会考虑横向扩展。

其他考虑因素:
* 避免使用大而笨重的框架,它在幕后所做的工作比您可能需要的要多。
* 尽可能使用缓存和静态变量。

每秒 40,000 条消息是可行的,但是当您将 IO 添加到混合中时,即使在具有大量内存的超快速硬件上,它也可能是不可预测的。 尝试尽可能多地进行带外处理。 如果失败,请查看是否可以运行多个线程(在多核或多进程计算机上),并在需要时查看集群中的多个服务器。

编辑:

在这种情况下,我无法充分强调负载测试的好处。 制作一个简单的原型并进行负载测试。 完善原型,直到获得所需的结果。 然后根据原型构建最终解决方案。 在测试所需的性能水平之前,您只能猜测解决方案。

  1. Message queuing. Your process flow sounds like a prime target for it.
  2. Clustering / load balancing.
  3. Streamline your code

First thing I'd do is queue the notifications. Then I'd queue all database writes that don't need to return a value. Then I'd look at scaling out.

Other considerations:
* Avoid a big clunky framework that does way more work behind than scenes than you likely need.
* Make use of cache and static variables wherever possible.

40,000 messages per second is doable, but when you add IO to the mix, it can be unpredictable even on super fast hardware with a ton of memory. Try to do as much out of band processing as you can. Where that fails, see if you can run multiple threads (on a multi-core or multi-proc machine) and look into multiple servers in a cluster if need be.

Edit:

I can't stress enough the benefits of load testing in a scenario like this. Make a simple prototype and load test. Refine the prototype until you get desired results. Then architect a final solution based on the prototype. Until you test for the desired performance level, you're guessing at the solution.

牵你手 2024-07-27 19:52:14

4k * 40.000/s = 160MB/s 是相当大的带宽。

您可能需要在两个方向上都具有该带宽,因为无消息丢失的要求意味着所有通信方都可以双向发送和接收。

将该数字除以网卡的平均吞吐量或硬盘的写入速度,即可发现这将是一个高度并行和冗余的系统。

您还需要对数据库操作和每条消息的计算进行基准测试,乘以 40.000(或一天 35 亿),以获得所需硬件的估计。

我想 .Net 要求将是您遇到的问题中最少的。

4k * 40.000/s = 160MB/s is quite some bandwidth.

You probably need to have that bandwidth in both directions, since the no-message-lost requirement means that all communicating parties send and receive both directions.

Divide that number by the average throughput of your network card, or the write speed of your harddisk, to find that this is going to be a highly parallel and redundant system.

You also need to benchmark your db operations and the calculations of each message, multiply by 40.000 (or, 3.5 billion for a single day), to get an estimate of the required hardware.

I guess the .Net requirement will be the least of your problems.

红衣飘飘貌似仙 2024-07-27 19:52:14

我要做的第一件事就是尝试找出您的要求的确切含义。 “软件或硬件故障时消息不会丢失”是不可能的。 假设您将消息写入 5000 个不同位置的 5000 个不同磁盘。 如果所有这些磁盘同时发生故障,您将不可避免地丢失数据。

同样,如果您确实在某个地方出现错误,则可能会丢失数据。 能够设计出一种在系统中任何地方出现错误时始终有效的解决方案的想法是不可能的。

一旦您确定了真正需要的冗余和可靠性级别,它就会更容易为您提供帮助。 您也可以更轻松地确信自己已经达到了可靠性水平。

The first thing I'd do is try to find out exactly what your requirements mean. "No messages can be lost in case of any software or hardware failures" is impossible. Suppose you write the message to 5000 different disks in 5000 different locations. If all of those disks fail simultaneously, you'll lose data, unavoidably.

Likewise if you do have a bug somewhere, that could lose data. The idea of being able to design a solution which will always work in the face of a bug anywhere in the system is impossible.

Once you've decided the level of redundancy and reliability you really need, it'll be more feasible to help you. It'll also be easier for you to have confidence that you've hit that level of reliability.

花海 2024-07-27 19:52:14

如果您使用 Microsoft 堆栈,则几乎肯定需要使用 MSMQ(Microsoft 消息队列)。 它有很多选项可以配置以提高可靠性或性能。 请查看 MSMQ 常见问题解答

瓶颈不是处理而是磁盘 I/O。 拥有大量 RAM,并在内存中尽可能多地执行操作。

MSMQ 在内存中管理其队列,但如果硬件出现故障,内存中的所有内容都会丢失。 如果您将消息标记为可恢复,它们将被写入磁盘,但您很容易遇到瓶颈。

If you're on a Microsoft stack, you will almost certainly need to use MSMQ (Microsoft Message Queueing). It has a lot of options you can configure for reliability or performance. Have a look at the MSMQ FAQ.

The bottle neck is not processing but disk I/O. Have a lot of RAM and do as much as you can in memory.

MSMQ manages its queue in memory but if hardware fails you, everything in memory is lost. If you mark your messages as recoverable they get written to disk but you can easily run into bottlenecks.

别挽留 2024-07-27 19:52:14

如果您使用 MSMQ 并将消息标记为可恢复,请务必小心可靠地将消息从队列中取出。 使该过程尽可能安全,因为如果出现问题,消息会堆积得如此之快,以至于驱动器将在不到一秒的时间内填满并导致系统崩溃。 那么所有传入的消息都将丢失。 问我怎么知道的。 (我没有创建它,我只是必须支持它。不好玩。)

我从来没有弄清楚如何告诉 MSMQ 将消息保存到 C: 以外的驱动器,但这将是必要的。 至少这样系统能够告诉你有问题。

如上所述,磁盘和数据库将成为瓶颈。 我认为 MSMQ 可以处理这个量,特别是如果您避免触发器等。

IBM 的 MQ 可能更适合这项任务。

If you use MSMQ and mark the messages as recoverable, be very careful about reliably taking the messages off the queue. Make that process as failsafe as you can, because if something goes wrong, messages can pile up so fast that the drive will fill up in a fraction of a second and crash the system. Then all incoming messages will be lost. Ask me how I know. (I didn't create it, I just had to support it. Not fun.)

I never did figure out how to tell MSMQ to persist messages to a drive other than C:, but that would be a necessity. At least that way the system will be able to tell you there is a problem.

As was mentioned above, disk and the database will be the bottleneck. I think that MSMQ can handle that volume, especially if you avoid triggers and such.

IBM's MQ is probably better suited to the task.

ゝ杯具 2024-07-27 19:52:14

我的建议是聘请已经构建过类似系统的人。 让他们选择架构和开发工具。 处理如此高的交易率将需要专业的硬件和软件知识,而获得这些知识的最便宜的方法就是付费。

My advice is to hire someone who has already built a similar system. Let them choose the architecture and the development tools. Dealing with such high transaction rates will require specialist hardware and software knowledge, and the cheapest way to aquire such knowledge is to pay money for it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文