低延迟、大规模消息队列

发布于 2024-08-15 07:30:49 字数 900 浏览 11 评论 0原文

我正在重新思考 Facebook 应用程序和云计算时代的大型多人游戏。

假设我要在现有开放协议之上构建一些东西,并且我想为 1,000,000 个同时玩家提供服务,只是为了解决问题。

假设每个玩家都有一个传入消息队列(用于聊天等),平均还有一个传入消息队列(公会、区域、实例、拍卖……),因此我们有 2,000,000 个队列。一名玩家一次会监听 1-10 个队列。每个队列平均每秒可能有 1 条消息,但某些队列将具有更高的速率和更多的侦听器(例如,关卡实例的“实体位置”队列)。我们假设系统排队延迟不超过 100 毫秒,这对于轻度动作游戏来说是可以接受的(但不适用于《雷神之锤》或《虚幻竞技场》等游戏)。

从其他系统中,我知道在单个 1U 或刀片盒上为 10,000 个用户提供服务是一个合理的期望(假设没有其他昂贵的事情发生,例如物理模拟或其他什么)。

因此,在 Crossbar 集群系统中,客户端连接到连接网关,连接网关又连接到消息队列服务器,每个网关有 10,000 个用户,有 100 个网关机器,每个队列服务器有 20,000 个消息队列,有 100 个队列机器。再次强调,仅用于一般范围界定。每台 MQ 机器上的连接数量很小:大约 100 个,用于与每个网关进行通信。网关上的连接数会更高:客户端 10,100 个 + 到所有队列服务器的连接。 (最重要的是,为游戏世界模拟服务器或其他什么添加一些连接,但我现在试图将其分开)

如果我不想从头开始构建它,我必须使用一些消息传递和/或现有的排队基础设施。我能找到的两个开放协议是 AMQP 和 XMPP。 XMPP 的预期用途有点像这个游戏系统所需要的,但开销非常明显(XML,加上详细的存在数据,加上必须在顶部构建的各种其他通道)。 AMQP的实际数据模型更接近我上面描述的,但所有用户似乎都是大型企业型公司,并且工作负载似乎与工作流相关,而不是与实时游戏更新相关。

有没有人有这些技术或其实现的白天经验,您可以分享一下吗?

I'm going through a bit of a re-think of large-scale multiplayer games in the age of Facebook applications and cloud computing.

Suppose I were to build something on top of existing open protocols, and I want to serve 1,000,000 simultaneous players, just to scope the problem.

Suppose each player has an incoming message queue (for chat and whatnot), and on average one more incoming message queue (guilds, zones, instances, auction, ...) so we have 2,000,000 queues. A player will listen to 1-10 queues at a time. Each queue will have on average maybe 1 message per second, but certain queues will have much higher rate and higher number of listeners (say, a "entity location" queue for a level instance). Let's assume no more than 100 milliseconds of system queuing latency, which is OK for mildly action-oriented games (but not games like Quake or Unreal Tournament).

From other systems, I know that serving 10,000 users on a single 1U or blade box is a reasonable expectation (assuming there's nothing else expensive going on, like physics simulation or whatnot).

So, with a crossbar cluster system, where clients connect to connection gateways, which in turn connect to message queue servers, we'd get 10,000 users per gateway with 100 gateway machines, and 20,000 message queues per queue server with 100 queue machines. Again, just for general scoping. The number of connections on each MQ machine would be tiny: about 100, to talk to each of the gateways. The number of connections on the gateways would be alot higher: 10,100 for the clients + connections to all the queue servers. (On top of this, add some connections for game world simulation servers or whatnot, but I'm trying to keep that separate for now)

If I didn't want to build this from scratch, I'd have to use some messaging and/or queuing infrastructure that exists. The two open protocols I can find are AMQP and XMPP. The intended use of XMPP is a little more like what this game system would need, but the overhead is quite noticeable (XML, plus the verbose presence data, plus various other channels that have to be built on top). The actual data model of AMQP is closer to what I describe above, but all the users seem to be large, enterprise-type corporations, and the workloads seem to be workflow related, not real-time game update related.

Does anyone have any daytime experience with these technologies, or implementations thereof, that you can share?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

似最初 2024-08-22 07:30:49

@MSalters

Re“消息队列”:

RabbitMQ的默认操作正是您所描述的:瞬态pubsub。但用 TCP 而不是 UDP。

如果您想要有保证的最终交付以及其他持久性和恢复功能,那么您也可以拥有 - 这是一个选项。这就是 RabbitMQ 和 AMQP 的全部要点——只需一个消息传递系统即可实现多种行为。

您描述的模型是默认行为,它是瞬态的,“即发即忘”,并将消息路由到收件人所在的任何地方。出于这个原因,人们使用 RabbitMQ 在 EC2 上进行多播发现。您可以通过单播 TCP pubsub 获取 UDP 类型行为。整洁吧?

关于 UDP:

我不确定 UDP 在这里是否有用。如果关闭 Nagling,则 RabbitMQ 单消息往返延迟(客户端-代理-客户端)经测量为 250-300 微秒。请参阅此处与 Windows 延迟(稍高一点)的比较 http://old.nabble.com/High%28er%29-latency-with-1.5.1--p21663105.html

我想不出有多少多人游戏需要更低的往返延迟超过 300 微秒。使用 TCP 可以使速度低于 300us。 TCP 窗口比原始 UDP 更昂贵,但如果您使用 UDP 来提高速度,并添加自定义丢失恢复或 seqno/ack/resend 管理器,那么这可能会再次减慢您的速度。这一切都取决于您的用例。如果你真的真的需要使用 UDP 和惰性确认等,那么你可以去掉 RabbitMQ 的 TCP 并且可能实现它。

我希望这有助于澄清为什么我为 Jon 的用例推荐 RabbitMQ。

@MSalters

Re 'message queue':

RabbitMQ's default operation is exactly what you describe: transient pubsub. But with TCP instead of UDP.

If you want guaranteed eventual delivery and other persistence and recovery features, then you CAN have that too - it's an option. That's the whole point of RabbitMQ and AMQP -- you can have lots of behaviours with just one message delivery system.

The model you describe is the DEFAULT behaviour, which is transient, "fire and forget", and routing messages to wherever the recipients are. People use RabbitMQ to do multicast discovery on EC2 for just that reason. You can get UDP type behaviours over unicast TCP pubsub. Neat, huh?

Re UDP:

I am not sure if UDP would be useful here. If you turn off Nagling then RabbitMQ single message roundtrip latency (client-broker-client) has been measured at 250-300 microseconds. See here for a comparison with Windows latency (which was a bit higher) http://old.nabble.com/High%28er%29-latency-with-1.5.1--p21663105.html

I cannot think of many multiplayer games that need roundtrip latency lower than 300 microseconds. You could get below 300us with TCP. TCP windowing is more expensive than raw UDP, but if you use UDP to go faster, and add a custom loss-recovery or seqno/ack/resend manager then that may slow you down again. It all depends on your use case. If you really really really need to use UDP and lazy acks and so on, then you could strip out RabbitMQ's TCP and probably pull that off.

I hope this helps clarify why I recommended RabbitMQ for Jon's use case.

獨角戲 2024-08-22 07:30:49

事实上,我现在正在构建这样一个系统。

我对几个 MQ 进行了大量的评估,包括 RabbitMQ、Qpid 和 ZeroMQ。其中任何一个的延迟和吞吐量对于此类应用程序来说都绰绰有余。然而,不好的是在 50 万个或更多队列中创建队列的时间。在几千个队列之后,Qpid 的性能下降尤其严重。为了避免这个问题,您通常必须创建自己的路由机制(队列总数较少,并且这些队列上的消费者正在获取他们不感兴趣的消息)。

我当前的系统可能会使用 ZeroMQ,但在集群内部的方式相当有限。来自客户端的连接通过自定义 sim 进行处理。我使用 libev 构建的守护进程,完全是单线程的(并且显示出非常好的扩展性——它应该能够在一个盒子上处理 50,000 个连接而没有任何问题——尽管我们的 sim.tick 率相当低,并且有没有物理)。

XML(因此 XMPP)非常不适合这种情况,因为您将在受到 I/O 约束之前很久就将处理 XML 的 CPU 固定下来,这不是您想要的。目前,我们正在使用 Google Protocol Buffers,它们似乎非常适合我们的特殊需求。我们还使用 TCP 进行客户端连接。我过去有使用 UDP 和 TCP 的经验,正如其他人指出的那样,UDP 确实有一些优势,但使用起来稍微困难一些。

希望当我们临近发布时,我能够分享更多细节。

I am building such a system now, actually.

I have done a fair amount of evaluation of several MQs, including RabbitMQ, Qpid, and ZeroMQ. The latency and throughput of any of those are more than adequate for this type of application. What is not good, however, is queue creation time in the midst of half a million queues or more. Qpid in particular degrades quite severely after a few thousand queues. To circumvent that problem, you will typically have to create your own routing mechanisms (smaller number of total queues, and consumers on those queues are getting messages that they don't have an interest in).

My current system will probably use ZeroMQ, but in a fairly limited way, inside the cluster. Connections from clients are handled with a custom sim. daemon that I built using libev and is entirely single-threaded (and is showing very good scaling -- it should be able to handle 50,000 connections on one box without any problems -- our sim. tick rate is quite low though, and there are no physics).

XML (and therefore XMPP) is very much not suited to this, as you'll peg the CPU processing XML long before you become bound on I/O, which isn't what you want. We're using Google Protocol Buffers, at the moment, and those seem well suited to our particular needs. We're also using TCP for the client connections. I have had experience using both UDP and TCP for this in the past, and as pointed out by others, UDP does have some advantage, but it's slightly more difficult to work with.

Hopefully when we're a little closer to launch, I'll be able to share more details.

物价感观 2024-08-22 07:30:49

Jon,这听起来像是 AMQP 和 RabbitMQ 的理想用例。

我不知道为什么你说AMQP用户都是大型企业型公司。我们一半以上的客户都在“网络”领域,从大型公司到小型公司。许多游戏、投注系统、聊天系统、twittery 类型系统和云计算基础设施都是用 RabbitMQ 构建的。甚至还有手机应用程序。工作流程只是众多用例之一。

我们尝试跟踪这里发生的事情:

http://www.rabbitmq.com/how。 html (确保您也点击进入 del.icio.us 上的用例列表!)

请务必看一下。我们随时为您提供帮助。请随时发送电子邮件至 [email protected] 或在 Twitter 上联系我 ( @monadic)。

Jon, this sounds like an ideal use case for AMQP and RabbitMQ.

I am not sure why you say that AMQP users are all large enterprise-type corporations. More than half of our customers are in the 'web' space ranging from huge to tiny companies. Lots of games, betting systems, chat systems, twittery type systems, and cloud computing infras have been built out of RabbitMQ. There are even mobile phone applications. Workflows are just one of many use cases.

We try to keep track of what is going on here:

http://www.rabbitmq.com/how.html (make sure you click through to the lists of use cases on del.icio.us too!)

Please do take a look. We are here to help. Feel free to email us at [email protected] or hit me on twitter (@monadic).

一人独醉 2024-08-22 07:30:49

我的经验是使用非开放的替代方案 BizTalk。我们学到的最惨痛的教训是这些复杂的系统速度并不快。正如您从硬件要求中得出的那样,这直接转化为巨大的成本。

因此,核心接口甚至不要接近 XML。您的服务器集群每秒将解析 200 万条消息。 XML 的速度很容易达到 2-20 GB/秒!然而,大多数消息将用于几个队列,而大多数队列实际上流量较低。

因此,请设计您的体系结构,以便可以轻松地从 COTS 队列服务器开始,然后在发现瓶颈时将每个队列(类型)移动到自定义队列服务器。

另外,出于类似的原因,不要假设消息队列体系结构最适合应用程序的所有通信需求。以您的“实例中的实体位置”为例。这是一个典型的案例,您想要有保证的消息传递。您需要共享此信息的原因是因为它一直在变化。因此,如果消息丢失,您不想花费时间来恢复它。您只需发送受影响实体的旧位置。相反,您希望发送该实体的当前位置。从技术角度来看,这意味着您需要 UDP,而不是 TCP 和自定义丢失恢复机制。

My experience was with a non-open alternative, BizTalk. The most painful lesson we learnt is that these complex systems are NOT fast. And as you figured from the hardware requirements, that translates directly into significant costs.

For that reason, don't even go near XML for the core interfaces. Your server cluster will be parsing 2 million messages per second. That could easily be 2-20 GB/sec of XML! However, most messages will be for a few queues, while most queues are in fact low-traffic.

Therefore, design your architecture so that it's easy to start with COTS queue servers and then move each queue (type) to a custom queue server when a bottleneck is identified.

Also, for similar reasons, don't assume that a message queue architecture is the best for all comminication needs your application has. Take your "entity location in an instance" example. This is a classic case where you don't want guaranteed message delivery. The reason that you need to share this information is because it changes all the time. So, if a message is lost, you don't want to spend time recovering it. You'd only send the old locatiom of the affected entity. Instead, you'd want to send the current location of that entity. Technology-wise this means you want UDP, not TCP and a custom loss-recovery mechanism.

凹づ凸ル 2024-08-22 07:30:49

FWIW,对于中间结果不重要的情况(如定位信息),Qpid 有一个“最后值队列”,只能向订阅者传递最新值。

FWIW, for cases where intermediate results are not important (like positioning info) Qpid has a "last-value queue" that can deliver only the most recent value to a subscriber.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文