进程的异步生成:设计问题 - Celery 或 Twisted

发布于 2024-10-11 00:57:04 字数 693 浏览 3 评论 0原文

全部:我正在寻求意见/指导/和设计理念。我的目标是找到一种精简但可靠的方法来从 HTTP POST 获取 XML 有效负载(这部分没有问题),对其进行解析,并异步生成一个相对寿命较长的进程。

生成的进程是 CPU 密集型进程,将持续大约三分钟。一开始我预计负载不会太大,但随着流量的增加,我很可能需要在服务器之间水平扩展负载。

我真的很喜欢 Celery/Django 堆栈的这种用途:它非常直观,并且拥有所有内置框架来完成我所需要的工作。我满怀热情地沿着这条路走下去,但很快我发现我的小型 512MB RAM 云服务器只有 100MB 的可用内存,并且我开始感觉到,一旦我的所有进程全速运行,我就会遇到麻烦。此外,它还有几个移动部件:RabbitMQ、MySQL、cerleryd、ligthttpd 和 django 容器。

我绝对可以增加服务器的大小,但我希望在该项目的早期阶段将成本降至最低。

作为替代方案,我正在考虑使用twisted 进行流程管理,并在需要时使用透视代理用于远程系统。但至少对我来说,虽然twisted很出色,但我觉得我沿着这条路做了很多事情:编写协议、回调管理、跟踪作业状态等。这里的好处非常明显 - 出色的性能,移动部件少得多,内存占用也更小(注意:我需要验证内存部分)。为此我非常偏向Python——它对我来说比其他选择更令人愉快:)

我非常感谢对此的任何观点。我担心一开始就走上错误的轨道,并且稍后用生产流量重做这件事将会很痛苦。

-马特

All: I'm seeking input/guidance/and design ideas. My goal is to find a lean but reliable way to take XML payload from an HTTP POST (no problems with this part), parse it, and spawn a relatively long-lived process asynchronously.

The spawned process is CPU intensive and will last for roughly three minutes. I don't expect much load at first, but there's a definite possibility that I will need to scale this out horizontally across servers as traffic hopefully increases.

I really like the Celery/Django stack for this use: it's very intuitive and has all of the built-in framework to accomplish exactly what I need. I started down that path with zeal, but I soon found my little 512MB RAM cloud server had only 100MB of free memory and I started sensing that I was headed for trouble once I went live with all of my processes running full-tilt. Also, it's got several moving parts: RabbitMQ, MySQL, cerleryd, ligthttpd and the django container.

I can absolutely increase the size of my server, but I'm hoping to keep my costs down to a minimum at this early phase of this project.

As an alternative, I'm considering using twisted for the process management, as well as perspective broker for the remote systems, should they be needed. But for me at least, while twisted is brilliant, I feel like I'm signing up for a lot going down that path: writing protocols, callback management, keeping track of job states, etc. The benefits here are pretty obvious - excellent performance, far fewer moving parts, and a smaller memory footprint (note: I need to verify the memory part). I'm heavily skewed toward Python for this - it's much more enjoyable for me than the alternatives :)

I'd greatly appreciate any perspective on this. I'm concerned about starting things off on the wrong track, and redoing this later with production traffic will be painful.

-Matt

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

伪装你 2024-10-18 00:57:04

在我的系统上,以相当合理的默认值运行的 RabbitMQ 使用大约 2MB 的 RAM。 Celeryd用量稍多一些,但不要过量。

在我看来,与堆栈的其余部分相比,RabbitMQ 和 celery 的开销几乎可以忽略不计。如果您正在处理需要几分钟才能完成的作业,那么一旦流量增加,这些作业就会压垮您的 512MB 服务器,而不是 RabbitMQ。从 RabbitMQ 和 Celery 开始至少会让您很好地水平扩展这些工作,所以您肯定走在正确的轨道上。

当然,您可以在 Twisted 中编写自己的作业控制,但我认为它不会给您带来太大帮助。 Twisted 具有相当不错的性能,但我不认为它的性能优于 RabbitMQ 足以证明引入错误和架构限制的时间和潜力是合理的。大多数情况下,担心优化似乎是错误的。花点时间重写 RabbitMQ,努力将这三分钟的工作减少 20% 左右。或者只需每月额外花费 20 美元即可将容量翻倍。

On my system, RabbitMQ running with pretty reasonable defaults is using about 2MB of RAM. Celeryd uses a bit more, but not an excessive amount.

In my opinion, the overhead of RabbitMQ and celery are pretty much negligible compared to the rest of the stack. If you're processing jobs that are going to take several minutes to complete, those jobs are what will overwhelm your 512MB server as soon as your traffic increases, not RabbitMQ. Starting off with RabbitMQ and Celery will at least set you up nicely to scale those jobs out horizontally though, so you're definitely on the right track there.

Sure, you could write your own job control in Twisted, but I don't see it gaining you much. Twisted has pretty good performance, but I wouldn't expect it to outperform RabbitMQ by enough to justify the time and potential for introducing bugs and architectural limitations. Mostly, it just seems like the wrong spot to worry about optimizing. Take the time that you would've spent re-writing RabbitMQ and work on reducing those three minute jobs by 20% or something. Or just spend an extra $20/month and double your capacity.

天冷不及心凉 2024-10-18 00:57:04

我将回答这个问题,就好像我是该项目的负责人一样,希望这能给您一些见解。

我正在开发一个项目,需要使用队列、面向公众的 Web 应用程序的 Web 服务器和多个作业客​​户端。

这个想法是让网络服务器持续运行(这里不需要非常强大的机器)。然而,工作是由这些作业客户端处理的,它们是更强大的机器,可以随意启动和停止。作业队列也将与 Web 应用程序驻留在同一台计算机上。当作业插入队列时,启动作业客户端的进程将启动并旋转第一个客户端。使用可以在负载增加时启动新服务器的负载平衡器,我不必费心管理正在运行的服务器数量来处理队列中的作业。如果一段时间后队列中没有作业,则可以终止所有作业客户端。

我建议使用与此类似的设置。您不希望作业执行影响 Web 应用程序的性能。

I'll answer this question as though I was the one doing the project and hopefully that might give you some insight.

I'm working on a project that will require the use of a queue, a web server for the public facing web application and several job clients.

The idea is to have the web server continuously running (no need for a very powerful machine here). However, the work is handled by these job clients which are more powerful machines that can be started and stopped at will. The job queue will also reside on the same machine as the web application. When a job gets inserted into the queue, a process that starts the job clients will kick into action and spin the first client. Using a load balancer that can start new servers as the load increases, I don't have to bother about managing the number of servers running to process jobs in the queue. If there are no jobs in the queue after a while, all job clients can be terminated.

I will suggest using a setup similar to this. You don't want job execution to affect the performance of your web application.

梦亿 2024-10-18 00:57:04

我很晚才添加另一种可能性:使用 Redis。
目前我将 redis 与twisted 结合使用:我将工作分配给工作人员。它们异步执行工作并返回结果。

“列表”类型非常有用:
http://www.redis.io/commands/rpoplpush

所以你可以使用 可靠的队列模式发送工作并有一个进程阻塞/等待,直到他有新的工作要做(一条新消息进入队列。

您可以在同一个队列上使用多个工作人员。Redis

内存不足打印但要注意待处理消息的数量,这会增加 Redis 使用的内存。

I Add, quite late another possibility: using Redis.
Currently I using redis with twisted : I distribute work to worker. They perform work and return result asynchronously.

The "List" type is very useful :
http://www.redis.io/commands/rpoplpush

So you can use the Reliable queue Pattern to send work and having a process that block/wait until he have a new work to do(a new message coming in queue.

you can use several worker on the same queue.

Redis have a low memory foot print but be careful of number of pending message , that will increase the memory that Redis use.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文