通过启动多个进程而不是使用线程来扩展 ruby​​ 脚本

发布于 2024-09-01 02:19:18 字数 208 浏览 5 评论 0原文

我想增加执行网络 I/O 的脚本(抓取器)的吞吐量。我不想在 ruby​​ 中使其成为多线程(我使用默认的 1.9.1 解释器),而是想启动多个进程。那么,是否有一个系统可以执行此操作,以便我可以跟踪何时完成重新启动它,以便我随时运行 X 个数字。另外,有些将使用不同的命令参数运行。我正在考虑编写一个 bash 脚本,但如果已经存在一种在 Linux 上执行此类操作的方法,这听起来可能是一个坏主意。

I want to increase the throughput of a script which does net I/O (a scraper). Instead of making it multithreaded in ruby (I use the default 1.9.1 interpreter), I want to launch multiple processes. So, is there a system for doing this to where I can track when one finishes to re-launch it again so that I have X number running at any time. ALso some will run with different command args. I was thinking of writing a bash script but it sounds like a potentially bad idea if there already exists a method for doing something like this on linux.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

咆哮 2024-09-08 02:19:19

我建议不要分叉,而是使用 EventMachine (以及优秀的 em-http-request(如果您使用的是 HTTP)。管理多个进程可能有点困难,甚至比处理多个线程还要复杂,但相比之下,沿着事件路径走要简单得多。由于您主要想做网络 IO,其中主要包括等待,因此我认为事件方法也可以扩展,或者比分叉或线程更好。最重要的是:它将需要更少的代码,并且更具可读性。

即使您决定为每个任务运行单独的进程,EventMachine 也可以帮助您使用 EventMachine.popen 等工具编写管理子进程的代码。

最后,如果您想在没有 EventMachine 的情况下执行此操作,请阅读 IO 的文档。 popenOpen3.popenOpen4.popen。它们都或多或少地执行相同的操作,但允许您访问子进程的 stdin、stdout、stderr(Open3、Open4)和 pid(Open4)。

I would recommend not forking but instead that you use EventMachine (and the excellent em-http-request if you're doing HTTP). Managing multiple processes can be a bit of a handful, even more so than handling multiple threads, but going down the evented path is, in comparison, much simpler. Since you want to do mostly network IO, which consist mostly of waiting, I think that an evented approach would scale as well, or better than forking or threading. And most importantly: it will require much less code, and it will be more readable.

Even if you decide on running separate processes for each task, EventMachine can help you write the code that manages the subprocesses using, for example, EventMachine.popen.

And finally, if you want to do it without EventMachine, read the docs for IO.popen, Open3.popen and Open4.popen. All do more or less the same thing but give you access to the stdin, stdout, stderr (Open3, Open4), and pid (Open4) of the subprocess.

紫轩蝶泪 2024-09-08 02:19:19

您可以尝试 fork http://ruby-doc.org/core/classes/ Process.html#M003148

您可以获取返回的PID并查看该进程是否再次运行。

如果你想管理 IO 并发。我建议你使用EventMachine。

You can try fork http://ruby-doc.org/core/classes/Process.html#M003148

You can get the PID in return and see if this process run again or not.

If you want manage IO concurrency. I suggest you to use EventMachine.

爱的十字路口 2024-09-08 02:19:19

您可以

  1. 实现(或找到等效的 gem)一个 ThreadPool(在您的情况下是 ProcessPool),或者
  2. 准备一个数组,假设要处理 1000 个任务,将其分成 10 个块,每块 100 个任务(10 是您要启动的并行进程数),并启动 10 个进程,其中每个进程立即接收 100 个要处理的任务。这样,您就不需要启动 1000 个进程并控制其中同时工作的进程不超过 10 个。

You can either

  1. implement (or find an equivalent gem) a ThreadPool (ProcessPool, in your case), or
  2. prepare an array of all, let's say 1000 tasks to be processed, split it into, say 10 chunks of 100 tasks (10 being the number of parallel processes you want to launch), and launch 10 processes, of which each process right away receives 100 tasks to process. That way you don't need to launch 1000 processes and control that not more than 10 of them work at the same time.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文