Jruby/Resque：resque 作业的网络连接开始反复失败

发布于 2024-12-03 08:56:20 字数 2914 浏览 7 评论 0原文

我在 resque 工作中遇到了一个奇怪的问题，我想知道其他人是否也遇到过。

我们在 jruby 1.6.2 下运行 resque

我们有一个长时间运行的任务，它从不同的 URL 下载一堆文件，使用 Fog 将这些文件上传到 Rackspace Cloudfiles，然后存储有关这些文件的一些信息在 MySQL 中。这种情况持续一段时间后，我们的应用程序的网络堆栈似乎崩溃了。在一个实例中，失败的第一个迹象是从这里开始超时：

org/jruby/ext/openssl/SSLSocket.java:512:in `sysread'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:35:in `fill_rbuff'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:158:in `eof?'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:133:in `readline'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/excon-0.6.5/lib/excon/response.rb:22:in `parse'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/excon-0.6.5/lib/excon/connection.rb:174:in `request'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/fog-0.11.0/lib/fog/core/connection.rb:20:in `request'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/fog-0.11.0/lib/fog/storage/rackspace.rb:107:in `request'

我们通常在作业运行大约 10-15 分钟后开始看到这些问题出现。之后，我们开始在每次后续尝试写入数据库时看到这一点...

ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.: SELECT `bills`.* FROM `bills` WHERE (`bills`.state_session_id = 59)
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.: SELECT `bills`.* FROM `bills` WHERE (`bills`.state_session_id = 59)
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract_adapter.rb:207:in `log'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract_adapter.rb:200:in `log'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-jdbc-adapter-1.1.1/lib/arjdbc/jdbc/adapter.rb:183:in `execute'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-jdbc-adapter-1.1.1/lib/arjdbc/jdbc/adapter.rb:275:in `select'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/query_cache.rb:56:in `select_all'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/base.rb:473:in `find_by_sql'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/relation.rb:64:in `to_a'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/relation/finder_methods.rb:143:in `all'

我尝试使用 ruby-cloudfiles gem 而不是 Fog，但我们似乎最终也开始遇到完全相同的错误，最终也使用该组合。如果我禁用文件下载/云文件上传部分，这些错误就永远不会出现，并且我已经能够让这个特定的作业运行几天。

关于这里可能发生什么的任何理论吗？

原文

I've having a bizarre issue with a resque job that I'm wondering if anyone else has run into.

We're running resque under jruby 1.6.2

We have a long running task that downloads a bunch of files from various URLs, uploads those files to Rackspace Cloudfiles using Fog, and then stores some information about those files in MySQL. After this has been going on for awhile, it seems like our application's networking stack falls over. In one instance, the first sign of failure was a timeout from here:

org/jruby/ext/openssl/SSLSocket.java:512:in `sysread'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:35:in `fill_rbuff'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:158:in `eof?'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:133:in `readline'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/excon-0.6.5/lib/excon/response.rb:22:in `parse'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/excon-0.6.5/lib/excon/connection.rb:174:in `request'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/fog-0.11.0/lib/fog/core/connection.rb:20:in `request'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/fog-0.11.0/lib/fog/storage/rackspace.rb:107:in `request'

We usually start seeing these problems appear about 10-15 minutes into the job run. After that, we start seeing this on every subsequent attempt to write to the database...

ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.: SELECT `bills`.* FROM `bills` WHERE (`bills`.state_session_id = 59)
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.: SELECT `bills`.* FROM `bills` WHERE (`bills`.state_session_id = 59)
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract_adapter.rb:207:in `log'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract_adapter.rb:200:in `log'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-jdbc-adapter-1.1.1/lib/arjdbc/jdbc/adapter.rb:183:in `execute'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-jdbc-adapter-1.1.1/lib/arjdbc/jdbc/adapter.rb:275:in `select'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/query_cache.rb:56:in `select_all'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/base.rb:473:in `find_by_sql'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/relation.rb:64:in `to_a'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/relation/finder_methods.rb:143:in `all'

I've tried using the ruby-cloudfiles gem instead of Fog, but we seem to start running into the exact same errors eventually using that combination too. If I disable the file download/cloudfiles upload piece of this, these errors never appear, and I've been able to leave this particular job running for a number of days.

Any theories on what could be happening here?

分享到QQ

分享到微博