Jruby/Resque:resque 作业的网络连接开始反复失败
我在 resque 工作中遇到了一个奇怪的问题,我想知道其他人是否也遇到过。
我们在 jruby 1.6.2 下运行 resque
我们有一个长时间运行的任务,它从不同的 URL 下载一堆文件,使用 Fog 将这些文件上传到 Rackspace Cloudfiles,然后存储有关这些文件的一些信息在 MySQL 中。这种情况持续一段时间后,我们的应用程序的网络堆栈似乎崩溃了。在一个实例中,失败的第一个迹象是从这里开始超时:
org/jruby/ext/openssl/SSLSocket.java:512:in `sysread'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:35:in `fill_rbuff'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:158:in `eof?'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:133:in `readline'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/excon-0.6.5/lib/excon/response.rb:22:in `parse'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/excon-0.6.5/lib/excon/connection.rb:174:in `request'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/fog-0.11.0/lib/fog/core/connection.rb:20:in `request'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/fog-0.11.0/lib/fog/storage/rackspace.rb:107:in `request'
我们通常在作业运行大约 10-15 分钟后开始看到这些问题出现。之后,我们开始在每次后续尝试写入数据库时看到这一点...
ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.: SELECT `bills`.* FROM `bills` WHERE (`bills`.state_session_id = 59)
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.: SELECT `bills`.* FROM `bills` WHERE (`bills`.state_session_id = 59)
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract_adapter.rb:207:in `log'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract_adapter.rb:200:in `log'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-jdbc-adapter-1.1.1/lib/arjdbc/jdbc/adapter.rb:183:in `execute'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-jdbc-adapter-1.1.1/lib/arjdbc/jdbc/adapter.rb:275:in `select'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/query_cache.rb:56:in `select_all'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/base.rb:473:in `find_by_sql'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/relation.rb:64:in `to_a'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/relation/finder_methods.rb:143:in `all'
我尝试使用 ruby-cloudfiles gem 而不是 Fog,但我们似乎最终也开始遇到完全相同的错误,最终也使用该组合。如果我禁用文件下载/云文件上传部分,这些错误就永远不会出现,并且我已经能够让这个特定的作业运行几天。
关于这里可能发生什么的任何理论吗?
I've having a bizarre issue with a resque job that I'm wondering if anyone else has run into.
We're running resque under jruby 1.6.2
We have a long running task that downloads a bunch of files from various URLs, uploads those files to Rackspace Cloudfiles using Fog, and then stores some information about those files in MySQL. After this has been going on for awhile, it seems like our application's networking stack falls over. In one instance, the first sign of failure was a timeout from here:
org/jruby/ext/openssl/SSLSocket.java:512:in `sysread'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:35:in `fill_rbuff'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:158:in `eof?'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/jruby-openssl-0.7.4/lib/openssl/buffering.rb:133:in `readline'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/excon-0.6.5/lib/excon/response.rb:22:in `parse'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/excon-0.6.5/lib/excon/connection.rb:174:in `request'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/fog-0.11.0/lib/fog/core/connection.rb:20:in `request'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/fog-0.11.0/lib/fog/storage/rackspace.rb:107:in `request'
We usually start seeing these problems appear about 10-15 minutes into the job run. After that, we start seeing this on every subsequent attempt to write to the database...
ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.: SELECT `bills`.* FROM `bills` WHERE (`bills`.state_session_id = 59)
ActiveRecord::StatementInvalid: ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.: SELECT `bills`.* FROM `bills` WHERE (`bills`.state_session_id = 59)
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract_adapter.rb:207:in `log'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract_adapter.rb:200:in `log'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-jdbc-adapter-1.1.1/lib/arjdbc/jdbc/adapter.rb:183:in `execute'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-jdbc-adapter-1.1.1/lib/arjdbc/jdbc/adapter.rb:275:in `select'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/database_statements.rb:7:in `select_all'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/connection_adapters/abstract/query_cache.rb:56:in `select_all'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/base.rb:473:in `find_by_sql'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/relation.rb:64:in `to_a'
/var/www/lisausa/shared/bundle/jruby/1.9/gems/activerecord-3.0.10/lib/active_record/relation/finder_methods.rb:143:in `all'
I've tried using the ruby-cloudfiles gem instead of Fog, but we seem to start running into the exact same errors eventually using that combination too. If I disable the file download/cloudfiles upload piece of this, these errors never appear, and I've been able to leave this particular job running for a number of days.
Any theories on what could be happening here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不管怎样,几个月后,我可以告诉您
“ActiveRecord::JDBCError: 无法从服务器读取响应。预计读取 4 个字节,在连接意外丢失之前读取 0 个字节...”
通常是以下情况的一个指标:我们知道数据库连接已断开(由于网络问题或数据库出现故障等)并且底层套接字已关闭。至少对我们来说,JRuby/ActiveRecord/ConnectionPool 的默认行为是在每个命令上不断出现此错误,而不是重新连接。
For what it's worth, months later, I can tell you that
"ActiveRecord::JDBCError: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost..."
has typically been an indicator for us that the database connection has been broken (either because of a network problem, or the DB went down, or etc.) and the underlying socket closed. For us at least, JRuby/ActiveRecord/ConnectionPool's default behavior is to keep blowing up with this error on every command instead of reconnecting.