ActiveResource“慢”时出现 EOFError应用程序编程接口

发布于 2024-11-15 01:58:16 字数 2278 浏览 3 评论 0原文

我正在努力解决这个问题,任何帮助将不胜感激!

我有两个 Rails 应用程序,我们称它们为客户端和服务,所有这些都非常简单,正常的 REST 接口 - 这是基本场景:

  • 客户端向服务发出 POST /resources.json 请求
  • 服务运行一个进程,该进程创建资源并返回客户端的 ID

同样,一切都非常简单,只是服务处理非常耗时,可能需要几分钟。如果发生这种情况,客户端会在发出请求后 60 秒内引发 EOFError(无论 ActiveResource::Base.timeout 设置为多少),同时服务正确处理请求并响应与 200/201。这是我们在日志中看到的内容(按时间顺序):

C 00:00:00: POST /resources.json
S 00:00:00: Received POST /resources.json => resources#create
C 00:01:00: EOFError: end of file reached
  /usr/ruby1.8.7/lib/ruby/1.8/net/protocol.rb:135:in `sysread'
  /usr/ruby1.8.7/lib/ruby/1.8/net/protocol.rb:135:in `rbuf_fill'
  /usr/ruby1.8.7/lib/ruby/1.8/timeout.rb:62:in `timeout'
  ...
S 00:02:23: Response POST /resources.json, 201, after 143s

显然,服务响应从未到达客户端。我将错误追溯到套接字级别,并在脚本中重新创建了该场景,其中我打开 TCPSocket 并尝试检索数据。由于我不请求任何内容,所以我不应该得到任何返回,并且我的请求应该在 70 秒后超时(请参阅底部的完整脚本):

Timeout::timeout(70) { TCPSocket.open(domain, 80).sysread(16384) }

这些是一些域的结果:

www.amazon.com     => Timeout after 70s
github.com         => EOFError after 60s
www.nytimes.com    => Timeout after 70s
www.mozilla.org    => EOFError after 13s
www.googlelabs.com => Timeout after 70s
maps.google.com    => Timeout after 70s

如您所见,某些服务器允许我们“等待”了整整 70 秒,而其他人终止了我们的连接,引发了 EOFErrors。当我们对我们的服务进行此测试时,我们(预期)在 60 秒后收到 EOFError。

有谁知道为什么会发生这种情况?有什么方法可以防止这些或延长服务器端超时吗?由于我们的服务继续“工作”,即使在套接字关闭之后,我认为它必须在代理级别终止?

每一个提示将不胜感激!

PS:完整脚本:

require 'socket'
require 'benchmark'
require 'timeout'

def test_socket(domain)
  puts "Connecting to #{domain}"
  message = nil
  time    = Benchmark.realtime do
    begin
      Timeout::timeout(70) { TCPSocket.open(domain, 80).sysread(16384) }
      message = "Successfully received data" # Should never happen
    rescue => e
      message = "Server terminated connection: #{e.class} #{e.message}"
    rescue Timeout::Error
      message = "Controlled client-side timeout"
    end
  end
  puts "  #{message} after #{time.round}s"
end

test_socket 'www.amazon.com'
test_socket 'github.com'
test_socket 'www.nytimes.com'
test_socket 'www.mozilla.org'
test_socket 'www.googlelabs.com'
test_socket 'maps.google.com'

I'm seriously struggling to solve this one, any help would be appreciated!

I have two Rails apps, let's call them Client and Service, all very simple, normal REST interface - here's the basic scenario:

  • Client makes a POST /resources.json request to the Service
  • The Service runs a process which creates the resource and returns an ID to the Client

Again, all very simple, just that Service processing is very time-intensive and can take several minutes. If that happens, an EOFError is raised on the Client, exactly 60s after the request was made (no matter what the ActiveResource::Base.timeout is set to) while the service correctly processed the request and responds with 200/201. This is what we see in the logs (chronologically):

C 00:00:00: POST /resources.json
S 00:00:00: Received POST /resources.json => resources#create
C 00:01:00: EOFError: end of file reached
  /usr/ruby1.8.7/lib/ruby/1.8/net/protocol.rb:135:in `sysread'
  /usr/ruby1.8.7/lib/ruby/1.8/net/protocol.rb:135:in `rbuf_fill'
  /usr/ruby1.8.7/lib/ruby/1.8/timeout.rb:62:in `timeout'
  ...
S 00:02:23: Response POST /resources.json, 201, after 143s

Obviously the service response never reached the client. I traced the error down to the socket level and recreated the scenario in a script, where I open a TCPSocket and try to retrieve data. Since I don't request anything, I shouldn't get anything back and my request should time out after 70 seconds (see full script at the bottom):

Timeout::timeout(70) { TCPSocket.open(domain, 80).sysread(16384) }

These were the results for a few domain:

www.amazon.com     => Timeout after 70s
github.com         => EOFError after 60s
www.nytimes.com    => Timeout after 70s
www.mozilla.org    => EOFError after 13s
www.googlelabs.com => Timeout after 70s
maps.google.com    => Timeout after 70s

As you can see, some servers allowed us to "wait" for the full 70 seconds, while others terminated our connection, raising EOFErrors. When we did this test against our service, we (expectedly) got an EOFError after 60 seconds.

Does anyone know why this happens? Is there any way to prevent these or extend the server-side time-out? Since our service continues "working", even after the socket was closed, I assume it must be terminated on the proxy-level?

Every hint would be greatly appreciated!

PS: The full script:

require 'socket'
require 'benchmark'
require 'timeout'

def test_socket(domain)
  puts "Connecting to #{domain}"
  message = nil
  time    = Benchmark.realtime do
    begin
      Timeout::timeout(70) { TCPSocket.open(domain, 80).sysread(16384) }
      message = "Successfully received data" # Should never happen
    rescue => e
      message = "Server terminated connection: #{e.class} #{e.message}"
    rescue Timeout::Error
      message = "Controlled client-side timeout"
    end
  end
  puts "  #{message} after #{time.round}s"
end

test_socket 'www.amazon.com'
test_socket 'github.com'
test_socket 'www.nytimes.com'
test_socket 'www.mozilla.org'
test_socket 'www.googlelabs.com'
test_socket 'maps.google.com'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

花开柳相依 2024-11-22 01:58:16

我知道这已经有将近一年的历史了,但如果其他人发现了这一点,我想添加一个可能的罪魁祸首。

Amazon 的 ELB 将在 60 秒终止空闲连接,因此如果您在 ELB 后面使用 EC2,则 ELB 可能是服务器端问题。

I know this is nearly a year old, but in case anyone else finds this, I wanted to add a possible culprit.

Amazon's ELB will terminate idle connections at 60 seconds, so if you are using EC2 behind ELB, then ELB could be the server side problem.

浴红衣 2024-11-22 01:58:16

每个服务器决定何时关闭连接。这取决于服务器端软件及其设置。你无法控制这一点。

Each server decides when to close the connection. It depends on the server side software and its settings. You can't control that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文