分布式视频编码 - Gearman 与 Beanstalkd
我希望构建一个由几十台机器组成的分布式视频编码集群。我以前从未使用过消息队列,但我开始使用的两个队列是 Gearman 和 Beanstalkd。
Beanstalk 似乎比 Gearman 更简单、更容易使用,但它的功能不如 Gearman 丰富。
我不明白的一件事是......如何在所有服务器上产生新的工作人员?我打算使用php。就像在 CLI 中使用“&”运行worker.php一样简单吗?然后就让它坐在那里等待工作吗?
我注意到 gearman 实际上并不会在作业完成后终止进程,但 Beanstalk 会这样做,所以我必须在每台服务器上的每个作业之后重新启动脚本。
目前我更倾向于使用 Beanstalk,我计划的一般流程是:
在每台服务器上运行一个分钟的 cron,检查是否有预定义数量的工作线程在运行。如果它少于预期,则生成新的工作进程。每个过程大约需要 2-30 分钟。
也许我的逻辑有缺陷?让我知道什么是“更好”或“正确”的方法?
Im looking to build a distributed video encoding cluster of a few dozen machines. Ive never worked with a messaging queue before, but the 2 that I started playing around with were Gearman and Beanstalkd.
Beanstalk seems to be a lot simpler and easier to use than Gearman, but its not as feature rich as.
One thing I don't understand is... how do you spawn new workers on all the servers? I plan to use php. Is it as simple as running worker.php in CLI with "&" and just have it sit there waiting for work?
I noticed gearman doesn't actually kill the process after a job is done, but Beanstalk does, so I have to restart the script after every job, on every server.
Currently Im more inclined to use Beanstalk, the general flow of things I planned was:
Run a minutely cron on each server that checks if there are pre-defined amount of workers running. If its less than supposed to be, spawn new worker processes. Each process will take roughly 2-30 minutes.
Maybe I have a flaw in my logic here? Let me know what would be a "better" or "proper" way of doing this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我将使用术语只是为了尝试清楚......
有生产者和消费者的概念。生产者生成的作业被放入队列(即 beanstalk 服务)中,然后由消费者读取。
编写消费者的方法有多种。您可以通过 cron 作业在每个 x 时间范围运行该任务,也可以通过 php (或者您有什么)让消费者在 while 1 循环中运行。
在哪里安装服务实际上取决于您的目标。对我来说,我通常将服务安装在消费者或其单独的盒子上(有时后者根据您的需要而过度杀伤)。
如果您希望队列端具有持久性,那么您应该使用 Beanstalk 的 binlog 参数 (-b )。如果您的 beanstalk 服务发生问题,这将允许您重新启动,同时队列中的数据损失最小(如果不是没有信息)。生产者端的持久性可能来自于有多个队列可供尝试。
Terminology I will use just to try and be clear...
There is the concept of a producer and a consumer. The producer generates jobs that are put on a queue (i.e. the beanstalk service) that is then read by a consumer.
There are multiple ways to write a consumer. You can either every x time frame via a cron job run the task or just have a consumer running in a while 1 loop via php (or what have you).
Where to install the service is really dependent on what you are going after. For me I normally install the service either on a consumer(s) or on its separate box (with sometimes the latter being overkill depending on your needs).
If you want durability on the queue side then you should use Beanstalk's binlog parameter (-b ). If something happens to your beanstalk service this will allow you to restart with minimal loss of data in the queues (if not no information). Durability on the producer side can come from having multiple queues to try against.