Rails 应用程序在迁移后抛出错误,但不应影响它

发布于 2024-12-27 20:16:59 字数 1680 浏览 0 评论 0原文

我正在尝试调试最近在我的 Rails (3.0.9) 应用程序中出现的一个问题,其中在数据库上执行不间断迁移时,必须重新启动(而不是重新部署)应用程序的实例。这种情况以前不会发生,而且我找不到最近应该开始的原因。

我的生产环境有几台服务器,每台服务器都使用分叉的 Unicorn 实例,所有服务器都共享一个 MySQL 实例。每个实例使用的连接池为 1。在对临时数据库进行测试后,我还有一台服务器用作最终临时环境来测试实时数据。

过去,我们能够在生产数据库上运行迁移,而无需重新部署甚至重新启动生产 Unicorn 服务器,只要它们是旧代码永远不需要看到的迁移即可。例如,向现有表添加一个新的可选字段 - 部署到登台服务器的代码会将数据写入其中,但现有服务器可以幸福地忽略它并继续以相同的方式使用数据库,直到我们已准备好将新代码投入生产。

不过,最近我们发现生产服务器在运行迁移后抛出异常,但仅限于它尝试将新行插入到修改后的表中时。应用程序的所有其他部分以及所有其他表都不受影响,但是一旦有插入,我们就会得到以下信息:

NoMethodError: undefined method `name' for nil:NilClass

[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:56:in `visit_Arel_Nodes_InsertStatement'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:55:in `map'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:55:in `visit_Arel_Nodes_InsertStatement'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/visitor.rb:15:in `send'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/visitor.rb:15:in `visit'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/visitor.rb:5:in `accept'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:18:in `accept'
[GEM_ROOT]/bundler/gems/rails-83fb5552b6ab/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb:111:in `with_connection'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:16:in `accept'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/tree_manager.rb:20:in `to_sql'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/select_manager.rb:217:in `insert'

如果我们重新启动生产服务器,问题就会消失,即使它们仍在运行旧代码。因此,当修改表时,连接中的某些内容似乎会被损坏,即使应用程序如果没有遇到此错误,也可以继续执行迁移之前正在执行的操作。

我对 ActiveRecord/ARel 进行了一些深入研究,但我最多能做的就是推测在某些时候该表的缓存模型会丢失它所拥有的列的任何知识,但我不明白为什么它会这样做或为什么会这样会突然发生。

I'm trying to debug an issue that's started coming up recently in my Rails (3.0.9) app, in which instances of the app have to be restarted (not redeployed) when a non-breaking migration is performed on the database. This did not used to happen, and I can't find a reason it should have started recently.

My production environment has a few servers, each of which is using forked Unicorn instances, all sharing a single MySQL instance. Each instance is using a connection pool of 1. I also have one server that's used as a final staging environment to test against live data, after testing against a staging database.

In the past, we've been able to run migrations on the production DB without redeploying or even restarting the production Unicorn servers as long as they were migrations that the old code never needed to see. For example, adding a new, optional field to an existing table -- the code deployed to the staging server would write data to it, but the existing servers could blissfully ignore it and continue to use the DB the same way it had been until we're ready to push the new code to production.

Recently, though, we've been seeing an exception being thrown by production servers after running the migration, but only when it tries to insert a new row into the modified table. All other parts of the app, and all other tables, are unaffected, but as soon as there's an insert, we get this:

NoMethodError: undefined method `name' for nil:NilClass

[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:56:in `visit_Arel_Nodes_InsertStatement'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:55:in `map'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:55:in `visit_Arel_Nodes_InsertStatement'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/visitor.rb:15:in `send'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/visitor.rb:15:in `visit'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/visitor.rb:5:in `accept'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:18:in `accept'
[GEM_ROOT]/bundler/gems/rails-83fb5552b6ab/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb:111:in `with_connection'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/visitors/to_sql.rb:16:in `accept'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/tree_manager.rb:20:in `to_sql'
[GEM_ROOT]/gems/arel-2.0.10/lib/arel/select_manager.rb:217:in `insert'

The issue goes away if we then restart the production servers, even though they're still running the old code. So it seems that something gets corrupted in the connection when the table is modified, even though the app, if it hadn't hit this error, would be fine continuing to do what is was doing before the migration.

I did some digging into ActiveRecord/ARel, but the most I can do is theorize that at some point the cached model of this table loses any knoweledge of which columns it has, but I can't see why it would do that or why this would be happening all of a sudden.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

清旖 2025-01-03 20:16:59

我是乔恩的同事,将为后代回答这个问题。

该问题是 https 中缓存的结果: //github.com/rails/rails/blob/v3.0.9/activerecord/lib/active_record/base.rb;本质上,某些模型类的 @columns 和 @arel_table 不同步,因为后者在应用程序初始化期间被缓存,因此由任何新工作人员从主 Unicorn 继承。

我们通过在初始化期间阻止调用任何模型的 ::scoped 或 ::unscoped 来修复此问题,并在无法调用时调用 ::reset_column_information。

I'm a coworker of Jon's, going to answer this for posterity.

The problem was a consequence of the caching in https://github.com/rails/rails/blob/v3.0.9/activerecord/lib/active_record/base.rb; essentially @columns and @arel_table for certain model classes were out of sync because the latter was cached during app initialization and thus inherited from the master Unicorn by any new worker.

We fixed it by preventing calls to any model's ::scoped or ::unscoped during initialization where we could, and calling ::reset_column_information afterwards where we couldn't.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文