通过启动多个进程而不是使用线程来扩展 ruby 脚本
我想增加执行网络 I/O 的脚本(抓取器)的吞吐量。我不想在 ruby 中使其成为多线程(我使用默认的 1.9.1 解释器),而是想启动多个进程。那么,是否有一个系统可以执行此操作,以便我可以跟踪何时完成重新启动它,以便我随时运行 X 个数字。另外,有些将使用不同的命令参数运行。我正在考虑编写一个 bash 脚本,但如果已经存在一种在 Linux 上执行此类操作的方法,这听起来可能是一个坏主意。
I want to increase the throughput of a script which does net I/O (a scraper). Instead of making it multithreaded in ruby (I use the default 1.9.1 interpreter), I want to launch multiple processes. So, is there a system for doing this to where I can track when one finishes to re-launch it again so that I have X number running at any time. ALso some will run with different command args. I was thinking of writing a bash script but it sounds like a potentially bad idea if there already exists a method for doing something like this on linux.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议不要分叉,而是使用 EventMachine (以及优秀的 em-http-request(如果您使用的是 HTTP)。管理多个进程可能有点困难,甚至比处理多个线程还要复杂,但相比之下,沿着事件路径走要简单得多。由于您主要想做网络 IO,其中主要包括等待,因此我认为事件方法也可以扩展,或者比分叉或线程更好。最重要的是:它将需要更少的代码,并且更具可读性。
即使您决定为每个任务运行单独的进程,EventMachine 也可以帮助您使用
EventMachine.popen
等工具编写管理子进程的代码。最后,如果您想在没有 EventMachine 的情况下执行此操作,请阅读 IO 的文档。 popen,Open3.popen 和 Open4.popen。它们都或多或少地执行相同的操作,但允许您访问子进程的 stdin、stdout、stderr(Open3、Open4)和 pid(Open4)。
I would recommend not forking but instead that you use EventMachine (and the excellent em-http-request if you're doing HTTP). Managing multiple processes can be a bit of a handful, even more so than handling multiple threads, but going down the evented path is, in comparison, much simpler. Since you want to do mostly network IO, which consist mostly of waiting, I think that an evented approach would scale as well, or better than forking or threading. And most importantly: it will require much less code, and it will be more readable.
Even if you decide on running separate processes for each task, EventMachine can help you write the code that manages the subprocesses using, for example,
EventMachine.popen
.And finally, if you want to do it without EventMachine, read the docs for IO.popen, Open3.popen and Open4.popen. All do more or less the same thing but give you access to the stdin, stdout, stderr (Open3, Open4), and pid (Open4) of the subprocess.
您可以尝试 fork http://ruby-doc.org/core/classes/ Process.html#M003148
您可以获取返回的PID并查看该进程是否再次运行。
如果你想管理 IO 并发。我建议你使用EventMachine。
You can try fork http://ruby-doc.org/core/classes/Process.html#M003148
You can get the PID in return and see if this process run again or not.
If you want manage IO concurrency. I suggest you to use EventMachine.
您可以
You can either