在 ruby 中一次读取文件 N 行
我有一个大文件(数百兆),其中包含文件名,每行一个。
我需要循环遍历文件名列表,并为每个文件名分叉一个进程。我一次最多需要 8 个分叉进程,并且不想一次将整个文件名列表读入 RAM。
我什至不知道从哪里开始,有人可以帮助我吗?
I have a large file (hundreds of megs) that consists of filenames, one per line.
I need to loop through the list of filenames, and fork off a process for each filename. I want a maximum of 8 forked processes at a time and I don't want to read the whole filename list into RAM at once.
I'm not even sure where to begin, can anyone help me out?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
听起来 流程模块 对于此任务很有用。以下是我快速拼凑起来的内容作为起点:
输出:
其工作方式是:
for line in open(XXX)
将惰性地迭代您指定的文件的行。fork
将生成一个执行的子进程给定的块,在本例中,我们使用反引号来指示 shell 要执行的内容。请注意,rand
在这里返回一个值 0-1,因此我们睡眠的时间不到一秒,我调用line.chomp
来删除从获得的尾随换行符>行。
wait
停止一切,直到其中一个返回。waitall
在退出脚本之前加入所有剩余进程。It sounds like the Process module will be useful for this task. Here's something I quickly threw together as a starting point:
Output:
The way this works is that:
for line in open(XXX)
will lazily iterate over the lines of the file you specify.fork
will spawn a child process executing the given block, and in this case, we use backticks to indicate something to be executed by the shell. Note thatrand
returns a value 0-1 here so we are sleeping less than a second, and I callline.chomp
to remove the trailing newline that we get fromline
.wait
to stop everything until one of them returns.waitall
to join all remaining processes before exiting the script.这是 Mark 的解决方案,封装为 ProcessPool 类,可能会有所帮助(如果我犯了一些错误,请纠正我):
Here's Mark's solution wrapped up as a
ProcessPool
class, might be helpful to have it around (and please correct me if I made some mistake):Queue 的标准库文档
我确实找到了 虽然有点冗长。
维基百科将此描述为线程池模式
The standard library documentation for Queue has
I do find it a little verbose though.
Wikipedia describes this as a thread pool pattern
arr = IO.readlines("文件名")
arr = IO.readlines("filename")