Ruby on Rails 3:通过 Rails 将数据流式传输到客户端

发布于 2024-09-15 03:53:46 字数 882 浏览 6 评论 0 原文

我正在开发一个 Ruby on Rails 应用程序,该应用程序与 RackSpace cloudfiles 进行通信(类似于 Amazon S3,但缺少一些功能)。

由于缺乏每个对象的访问权限和查询字符串身份验证,用户的下载必须通过应用程序进行协调。

在 Rails 2.3 中,看起来您可以按如下方式动态构建响应:(

# Streams about 180 MB of generated data to the browser.
render :text => proc { |response, output|
  10_000_000.times do |i|
    output.write("This is line #{i}\n")
  end
}

来自 http ://api.rubyonrails.org/classes/ActionController/Base.html#M000464

我可以将我的cloudfiles流生成代码转储到那里,而不是10_000_000.times...

问题是,这是我尝试在 Rails 3 中使用此技术时得到的输出。

#<Proc:0x000000010989a6e8@/Users/jderiksen/lt/lt-uber/site/app/controllers/prospect_uploads_controller.rb:75>

看起来可能 proc 对象的 call 方法没有被调用?还有其他想法吗?

I am working on a Ruby on Rails app that communicates with RackSpace cloudfiles (similar to Amazon S3 but lacking some features).

Due to the lack of the availability of per-object access permissions and query string authentication, downloads to users have to be mediated through an application.

In Rails 2.3, it looks like you can dynamically build a response as follows:

# Streams about 180 MB of generated data to the browser.
render :text => proc { |response, output|
  10_000_000.times do |i|
    output.write("This is line #{i}\n")
  end
}

(from http://api.rubyonrails.org/classes/ActionController/Base.html#M000464)

Instead of 10_000_000.times... I could dump my cloudfiles stream generation code in there.

Trouble is, this is the output I get when I attempt to use this technique in Rails 3.

#<Proc:0x000000010989a6e8@/Users/jderiksen/lt/lt-uber/site/app/controllers/prospect_uploads_controller.rb:75>

Looks like maybe the proc object's call method is not being called? Any other ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

情定在深秋 2024-09-22 03:53:46

分配给 response_body 一个响应 #each 的对象:

class Streamer
  def each
    10_000_000.times do |i|
      yield "This is line #{i}\n"
    end
  end
end

self.response_body = Streamer.new

如果您使用的是 1.9.x 或 反向移植 gem,您可以使用 Enumerator.new

self.response_body = Enumerator.new do |y|
  10_000_000.times do |i|
    y << "This is line #{i}\n"
  end
end

请注意,何时以及是否刷新数据取决于所使用的 Rack 处理程序和底层服务器。例如,我已经确认 Mongrel 会传输数据,但其他用户报告说 WEBrick 会缓冲数据,直到响应关闭。没有办法强制响应刷新。

在 Rails 3.0.x 中,还有几个额外的问题:

  • 在开发模式下,由于与类重新加载的交互不良,从枚举中访问模型类等操作可能会出现问题。这是Rails 3.0.x 中的开放错误
  • Rack 和 Rails 之间交互中的错误导致每个请求调用 #each 两次。这是另一个未解决的错误。您可以使用以下猴子补丁来解决这个问题:

    类 Rack::Response
      确定关闭
        @body.close 如果 @body.respond_to?(:close)
      结尾
    结尾
    

这两个问题都在 Rails 3.1 中得到了解决,其中 HTTP 流是一项重要功能。

请注意另一个常见的建议, self.response_body = proc {|response, output| ...},在 Rails 3.0.x 中可以工作,但在 3.1 中已被弃用(并且不再实际传输数据)。分配一个响应 #each 的对象适用于所有 Rails 3 版本。

Assign to response_body an object that responds to #each:

class Streamer
  def each
    10_000_000.times do |i|
      yield "This is line #{i}\n"
    end
  end
end

self.response_body = Streamer.new

If you are using 1.9.x or the Backports gem, you can write this more compactly using Enumerator.new:

self.response_body = Enumerator.new do |y|
  10_000_000.times do |i|
    y << "This is line #{i}\n"
  end
end

Note that when and if the data is flushed depends on the Rack handler and underlying server being used. I have confirmed that Mongrel, for instance, will stream the data, but other users have reported that WEBrick, for instance, buffers it until the response is closed. There is no way to force the response to flush.

In Rails 3.0.x, there are several additional gotchas:

  • In development mode, doing things such as accessing model classes from within the enumeration can be problematic due to bad interactions with class reloading. This is an open bug in Rails 3.0.x.
  • A bug in the interaction between Rack and Rails causes #each to be called twice for each request. This is another open bug. You can work around it with the following monkey patch:

    class Rack::Response
      def close
        @body.close if @body.respond_to?(:close)
      end
    end
    

Both problems are fixed in Rails 3.1, where HTTP streaming is a marquee feature.

Note that the other common suggestion, self.response_body = proc {|response, output| ...}, does work in Rails 3.0.x, but has been deprecated (and will no longer actually stream the data) in 3.1. Assigning an object that responds to #each works in all Rails 3 versions.

天涯沦落人 2024-09-22 03:53:46

感谢以上所有帖子,这里提供了用于传输大型 CSV 的完整工作代码。此代码:

  1. 不需要任何额外的宝石。
  2. 使用 Model.find_each() 以免所有匹配对象导致内存膨胀。
  3. 已在rails 3.2.5上进行测试,
    ruby 1.9.3 和 heroku 使用 unicorn,带有单个 dyno。
  4. 每 500 行添加一次 GC.start,以免破坏 heroku dyno
    允许的内存。
  5. 您可能需要根据模型的内存占用量调整 GC.start。我已经成功地使用它将 105K 模型流式传输到 9.7MB 的 csv 中,没有任何问题。

控制器方法:

def csv_export
  respond_to do |format|
    format.csv {
      @filename = "responses-#{Date.today.to_s(:db)}.csv"
      self.response.headers["Content-Type"] ||= 'text/csv'
      self.response.headers["Content-Disposition"] = "attachment; filename=#{@filename}"
      self.response.headers['Last-Modified'] = Time.now.ctime.to_s

      self.response_body = Enumerator.new do |y|
        i = 0
        Model.find_each do |m|
          if i == 0
            y << Model.csv_header.to_csv
          end
          y << sr.csv_array.to_csv
          i = i+1
          GC.start if i%500==0
        end
      end
    }
  end
end

config/unicorn.rb

# Set to 3 instead of 4 as per http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/
worker_processes 3

# Change timeout to 120s to allow downloading of large streamed CSVs on slow networks
timeout 120

#Enable streaming
port = ENV["PORT"].to_i
listen port, :tcp_nopush => false

Model.rb

  def self.csv_header
    ["ID", "Route", "username"]
  end

  def csv_array
    [id, route, username]
  end

Thanks to all the posts above, here is fully working code to stream large CSVs. This code:

  1. Does not require any additional gems.
  2. Uses Model.find_each() so as to not bloat memory with all matching objects.
  3. Has been tested on rails 3.2.5,
    ruby 1.9.3 and heroku using unicorn, with single dyno.
  4. Adds a GC.start at every 500 rows, so as not to blow the heroku dyno's
    allowed memory.
  5. You may need to adjust the GC.start depending on your Model's memory footprint. I have successfully used this to stream 105K models into a csv of 9.7MB without any problems.

Controller Method:

def csv_export
  respond_to do |format|
    format.csv {
      @filename = "responses-#{Date.today.to_s(:db)}.csv"
      self.response.headers["Content-Type"] ||= 'text/csv'
      self.response.headers["Content-Disposition"] = "attachment; filename=#{@filename}"
      self.response.headers['Last-Modified'] = Time.now.ctime.to_s

      self.response_body = Enumerator.new do |y|
        i = 0
        Model.find_each do |m|
          if i == 0
            y << Model.csv_header.to_csv
          end
          y << sr.csv_array.to_csv
          i = i+1
          GC.start if i%500==0
        end
      end
    }
  end
end

config/unicorn.rb

# Set to 3 instead of 4 as per http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/
worker_processes 3

# Change timeout to 120s to allow downloading of large streamed CSVs on slow networks
timeout 120

#Enable streaming
port = ENV["PORT"].to_i
listen port, :tcp_nopush => false

Model.rb

  def self.csv_header
    ["ID", "Route", "username"]
  end

  def csv_array
    [id, route, username]
  end
林空鹿饮溪 2024-09-22 03:53:46

看起来这在 Rails 3 中不可用

https:// rails.lighthouseapp.com/projects/8994/tickets/2546-render-text-proc

这似乎在我的控制器中对我有用:

self.response_body =  proc{ |response, output|
  output.write "Hello world"
}

It looks like this isn't available in Rails 3

https://rails.lighthouseapp.com/projects/8994/tickets/2546-render-text-proc

This appeared to work for me in my controller:

self.response_body =  proc{ |response, output|
  output.write "Hello world"
}
夏末 2024-09-22 03:53:46

如果您分配给response_body一个响应#each方法的对象,并且它会缓冲直到响应关闭,请尝试在操作控制器中:

self.response.headers['Last-Modified'] = Time.now.to_s

In case you are assigning to response_body an object that responds to #each method and it's buffering until the response is closed, try in in action controller:

self.response.headers['Last-Modified'] = Time.now.to_s

℡Ms空城旧梦 2024-09-22 03:53:46

仅供记录,rails >= 3.1 有一种简单的方法来流数据,方法是将响应 #each 方法的对象分配给控制器的响应。

一切都在这里解释: http://blog.sparqcode.com/2012/02/04/streaming-data-with-rails-3-1-or-3-2/

Just for the record, rails >= 3.1 has an easy way to stream data by assigning an object that respond to #each method to the controller's response.

Everything is explained here: http://blog.sparqcode.com/2012/02/04/streaming-data-with-rails-3-1-or-3-2/

悲凉≈ 2024-09-22 03:53:46

是的,response_body 是 Rails 3 目前执行此操作的方式: https://rails.lighthouseapp.com/projects/8994/tickets/4554-render-text-proc-regression

Yes, response_body is the Rails 3 way of doing this for the moment: https://rails.lighthouseapp.com/projects/8994/tickets/4554-render-text-proc-regression

彼岸花似海 2024-09-22 03:53:46

这也解决了我的问题 - 我有 gzip 的 CSV 文件,想以解压的 CSV 形式发送给用户,所以我使用 GzipReader 一次读取一行。

如果您尝试下载大文件,这些行也很有帮助:


self.response.headers["Content-Type"] = "application/octet-stream"
self.response.headers["Content-Disposition"] = "附件;文件名=#{文件名}"

This solved my problem as well - I have gzip'd CSV files, want to send to the user as unzipped CSV, so I read them a line at a time using a GzipReader.

These lines are also helpful if you're trying to deliver a big file as a download:


self.response.headers["Content-Type"] = "application/octet-stream"
self.response.headers["Content-Disposition"] = "attachment; filename=#{filename}"

内心旳酸楚 2024-09-22 03:53:46

此外,您还必须自行设置'Content-Length'标头。

如果不是,Rack 将不得不等待(将主体数据缓冲到内存中)来确定长度。
使用上述方法将会毁掉你的努力。

就我而言,我可以确定长度。
如果不能,您需要让 Rack 开始发送不带“Content-Length”标头的正文。
尝试在“run”之前的“require”之后添加到config.ru中“use Rack::Chunked”。 (感谢阿卡迪)

In addition, you will have to set the 'Content-Length' header by your self.

If not, Rack will have to wait (buffering body data into memory) to determine the length.
And it will ruin your efforts using the methods described above.

In my case, I could determine the length.
In cases you can't, you need to make Rack to start sending body without a 'Content-Length' header.
Try to add into config.ru "use Rack::Chunked" after 'require' before the 'run'. (Thanks arkadiy)

天气好吗我好吗 2024-09-22 03:53:46

我在灯塔票中评论,只是想说 self.response_body = proc 方法对我有用,尽管我需要使用 Mongrel 而不是 WEBrick 才能成功。

马丁

I commented in the lighthouse ticket, just wanted to say the self.response_body = proc approach worked for me though I needed to use Mongrel instead of WEBrick to succeed.

Martin

吾家有女初长成 2024-09-22 03:53:46

应用约翰的解决方案和埃克斯奎尔的建议对我有用。

该语句

self.response.headers['Last-Modified'] = Time.now.to_s

将响应标记为在机架中不可缓存。

经过进一步调查后,我认为人们也可以使用这个:

headers['Cache-Control'] = 'no-cache'

对我来说,这只是稍微更直观一些。它将消息传达给可能正在阅读我的代码的任何其他人。另外,如果未来版本的rack停止检查Last-Modified,很多代码可能会被破坏,人们可能需要一段时间才能弄清楚原因。

Applying John's solution along with Exequiel's suggestion worked for me.

The statement

self.response.headers['Last-Modified'] = Time.now.to_s

marks the response as non-cacheable in rack.

After investigating further, I figured one could also use this :

headers['Cache-Control'] = 'no-cache'

This, to me, is just slightly more intuitive. It conveys the message to any1 else who may be reading my code. Also, in case a future version of rack stops checking for Last-Modified , a lot of code may break and it may be a while for folks to figure out why.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文