如何使用 Ruby(和 open-uri)并行处理数组中的项目

发布于 2024-12-06 18:32:51 字数 455 浏览 6 评论 0原文

我想知道如何使用 open-uri 打开多个并发连接?我想我需要以某种方式使用螺纹或纤维,但我不确定。

示例代码:

def get_doc(url)
  begin
    Nokogiri::HTML(open(url).read)
  rescue Exception => ex
    puts "Failed at #{Time.now}"
    puts "Error: #{ex}"
  end
end

array_of_urls_to_process = [......]

# How can I iterate over items in the array in parallel (instead of one at a time?)
array_of_urls_to_process.each do |url|
  x = get_doc(url)
  do_something(x)
end

I am wondering how i can go about opening multiple concurrent connections using open-uri? i THINK I need to use threading or fibers some how but i'm not sure.

Example code:

def get_doc(url)
  begin
    Nokogiri::HTML(open(url).read)
  rescue Exception => ex
    puts "Failed at #{Time.now}"
    puts "Error: #{ex}"
  end
end

array_of_urls_to_process = [......]

# How can I iterate over items in the array in parallel (instead of one at a time?)
array_of_urls_to_process.each do |url|
  x = get_doc(url)
  do_something(x)
end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

嗫嚅 2024-12-13 18:32:51

还有一个名为 Parallel 的 gem,它与 Peach 类似,但正在积极更新。

There's also a gem called Parallel which is similar to Peach, but is actively updated.

夏尔 2024-12-13 18:32:51

我希望这能给您一个想法:

def do_something(url, secs)
    sleep secs #just to see a difference
    puts "Done with: #{url}"
end

threads = []
urls_ary = ['url1', 'url2', 'url3']

urls_ary.each_with_index do |url, i|
    threads << Thread.new{ do_something(url, i+1) }
    puts "Out of loop #{i+1}"
end
threads.each{|t| t.join}

也许为 Array 创建一个方法,如下所示:

class Array
    def thread_each(&block)
        inject([]){|threads,e| threads << Thread.new{yield(e)}}.each{|t| t.join}
    end
end

[1, 2, 3].thread_each do |i|
    sleep 4-i #so first one ends later
    puts "Done with #{i}"
end

I hope this gives you an idea:

def do_something(url, secs)
    sleep secs #just to see a difference
    puts "Done with: #{url}"
end

threads = []
urls_ary = ['url1', 'url2', 'url3']

urls_ary.each_with_index do |url, i|
    threads << Thread.new{ do_something(url, i+1) }
    puts "Out of loop #{i+1}"
end
threads.each{|t| t.join}

Perhaps creating a method for Array like:

class Array
    def thread_each(&block)
        inject([]){|threads,e| threads << Thread.new{yield(e)}}.each{|t| t.join}
    end
end

[1, 2, 3].thread_each do |i|
    sleep 4-i #so first one ends later
    puts "Done with #{i}"
end
谁把谁当真 2024-12-13 18:32:51
module MultithreadedEach
  def multithreaded_each
    each_with_object([]) do |item, threads|
      threads << Thread.new { yield item }
    end.each { |thread| thread.join }
    self
  end
end

用法:

arr = [1,2,3]

arr.extend(MultithreadedEach)

arr.multithreaded_each do |n|
  puts n # Each block runs in it's own thread
end
module MultithreadedEach
  def multithreaded_each
    each_with_object([]) do |item, threads|
      threads << Thread.new { yield item }
    end.each { |thread| thread.join }
    self
  end
end

Usage:

arr = [1,2,3]

arr.extend(MultithreadedEach)

arr.multithreaded_each do |n|
  puts n # Each block runs in it's own thread
end
愿得七秒忆 2024-12-13 18:32:51

使用线程的简单方法:

threads = []

[1, 2, 3].each do |i|
  threads << Thread.new { puts i }
end

threads.each(&:join)

A simple method using threads:

threads = []

[1, 2, 3].each do |i|
  threads << Thread.new { puts i }
end

threads.each(&:join)
许仙没带伞 2024-12-13 18:32:51

有一种名为 peach 的 gem (https://rubygems.org/gems/peach) 它可以让你这样做:

require "peach"

array_of_urls_to_process.peach do |url|
  do_something(get_doc(url))
end

There is a gem called peach (https://rubygems.org/gems/peach) which lets you do this:

require "peach"

array_of_urls_to_process.peach do |url|
  do_something(get_doc(url))
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文