复杂分支系统中的数据库迁移
在我们当前的开发工作流程中,我们引入了数据库迁移(使用 Ruckusing)来保留开发人员的数据库架构同步。它工作得很好,使用起来非常简单,但现在我们已经切换到 git 作为 VCS,我们面临着数据库版本控制系统中的下一个问题。
当检查已经开发了一段时间的分支时,可能会发生数据库架构与我来自的分支中的架构有很大差异的情况。在某些情况下这会导致数据库冲突。从逻辑上讲,我们似乎需要根据之前所在的分支来运行迁移,但这可能会很快变得复杂,并且肯定会遇到一些人的问题。据我所知,没有一个支持分支的数据库迁移系统?
当切换到功能分支时,会增加复杂性,我们可能需要运行一些迁移,而另一些则关闭……从技术上讲,使用我们当前的 dbmigration 脚本似乎不可能,是否有任何明智的替代方案?在非常活跃的分支开发系统中是否有任何首选的数据库迁移方法?
In our current development workflow we have introduced database migrations (using Ruckusing) to keep our developers' db schema's in sync. It works great, is pretty straightforward in use but now we have switched to git as VCS we are facing the next problem in our database versioning system.
When checking out a branch that has been in development for some time it might happen that the database schema has diverged alot from the schema in the branch i'm coming from. This causes database conflicts in some cases. Logically it seems that we need to run migrations depending on the branch we were on previously but that can get complex really fast and will run into problems with some people for sure. And as far as i know there isn't a db migration system that is branch-aware??
Added complexity comes when switching to a feature branch we might need to run some migrations up while other down ... technically this seems impossible using our current dbmigration scripts, are there any sane alternatives? Are there any preferred ways of working with database migrations in a very active and branched development system?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我真的不同意增量迁移已经腐烂。在我看来,拥有一套自行开发的脚本比拥有一个真正的工具来完成这样的工作更糟糕,这样可以更容易地跟踪这些变化。我自己也遇到过类似的情况,所以希望能分享一些见解。
根据我的经验,RDBMS 模式和分支不能很好地混合。根据您的分支,模式可能至少应该有些相似,在这种情况下,迁移不应有太大差异。或者我可能只是误解了问题的全部范围。例如,如果您试图将客户特定的代码保留在分支上,那么也许您应该考虑一种将其模块化的方法。我们做了类似的事情,制定了规定客户特定架构更改的规则,并且代码只能依赖于公共代码库,而不是相反。我们还根据模块和日期设置模块变更集之间的优先级,因此我们在大多数情况下都知道应用更改的顺序。 YMMV,当然,但在不知道您当前设置的情况下很难给出具体细节。
在我以前的公司,我们成功地使用了一个名为 Liquibase 的工具,这听起来与您正在使用的类似。基本上,它是一种用于获取数据库模式以及将所有数据从一种已知状态转移到另一种已知状态的工具。相同的变更集仅应用一次,因为 liquibase 维护带有校验和的变更日志。变更日志以特定的 XML 格式编写。如果您需要替代方案,我强烈建议您尝试一下。
不管怎样,我们处理客户代码和分支的方式是为给定的分支拥有特定的数据库/模式。这样,您就可以获得分支点的架构和数据,并且仅将差异迁移到当前情况。即使 liquibase 理论上可以支持这一点,我们也没有撤消更改,因为我们认为这太麻烦且容易出错。鉴于 liquibase 保留其自己的状态,迁移总是像在给定分支上获取当前状态并应用所有状态一样简单。仅应用新的变更集,使模式保持良好状态。
我们使用 mercurial,它是分布式的,就像 git 一样,所以设置非常相似。我们还在开发笔记本电脑上设置了开发人员特定的本地数据库,以及许多环境,分别针对不同的客户和阶段(开发、集成、生产),因此该模型经过了真正的测试,并且效果出奇地好。我们在变更集中遇到了一些冲突,但我们在问题出现后很快就能够解决这些冲突。本地开发环境确实是最难的部分,因为在开发过程中可能会引入一些模式更改,这些更改并不总是与后来的更改集兼容,但更改的结构化性质以及具有已知的状态要恢复导致很少真正的问题。
此方法有一些注意事项:
但是,重点是,您对数据库的结构和控制越多,迁移就越容易。因此,像 Liquibase 这样的工具可能是帮助您跟踪这些更改的非常有价值的资产。与简单模型相比,这甚至在更大程度上适用于更复杂的模型,因此请至少不要考虑放弃您已经拥有的所有工具。并花一些时间探索其他替代工具。
一些结构和控制总比没有好,甚至更糟,认为您可以控制一大堆手动脚本。
I wouldn't really agree with incremental migrations being rotten. Having a set of homegrown scripts would, in my opinion, be a worse approach than having a real tool for such a job will make tracking those changes easier. I've had to deal with a similar situation myself, before, so hopefully I can share some of the insights.
To my experience RDBMS-schemas and branches don't mix very well. Depending on your branching the schemas should probably be at least somewhat similar, in which case the migrations should not differ too much. Or I might just have misunderstand the full extent of the problem. If you're e.g. trying to keep customer specific code on a branch, then maybe you should consider a way to modularize it instead. We did something like this, having rules that stated that customer specific schema changes, and code could only ever depend on the common code base, not the other way around. We also set the precedence between module changesets based on module and date, so we for most parts knew the order in which the changes were to be applied. YMMV, of course, but it's hard to give specifics, without knowing your current setup.
At my old company we successfully used a tool called Liquibase, which sounds similar to what you're using. Basically it is a tool for taking a DB schema, and all the data from one known state to another known state. The same changeset is applied only once, since liquibase maintains a changelog, with checksums. The changelogs are written in a specific XML format. I can strongly recommend to try it out, if you need alternatives.
Anyway, the way we handled customer code and branches, was to have a specific DB/schema for a given branch. This way you could have the schema and data from the branching point, and only migrate the diff to the current situation. We did not undo changes, even if liquibase in theory could support this, as we felt it was way too cumbersome and error prone. Given that liquibase keeps it's own state, the migration was always as easy as taking the current state on a given branch, and apply all. Only new changesets were applied, leaving the schema in a good state.
We used mercurial, which is distributed, like git, so the setup was quite similar. We also had developer specific local DBs on the dev laptops, and a number of environments, both for different customers and phases (development, integration, production), so the model was put to a real test, and it worked surprisingly well. We had some conflicts in the changesets, but we were mostly able to resolve those soon after the problem was introduced. The local development envs were really the hardest part, since during development some schema changes might have been introduced, which were not always compatible with later changesets, but the structured nature of the changes, and having a known state to revert to lead to very few real problems.
There are a few caveats with this approach:
The point is, however, that the more structure and control you have over the database, the easier migrations will be. Therefore tools like Liquibase could be a really valuable asset to help you track those changes. This applies to more complex models even to a greater extent, than to simple ones, so please at least don't consider dumping all tools you already have in place. And take some time to explore other alternative tools.
Some structure and control is better than none, or even worse, thinking you are in control with a big bunch of manual scripts.
我认为增量迁移的整个想法真的很糟糕。在像你这样复杂的环境下,确实行不通。您可以使其适用于简单的分支模式,但对于任何复杂的情况,这将是一场噩梦。
我现在使用的系统采用了不同的方法:我们无法进行增量迁移,而只能从基线重建数据库。在最初的开发过程中,该基线是一个空数据库,在维护过程中,它是实时数据库的副本(从转储中恢复)。我们只有一堆 SQL 和 XML 脚本,我们将它们应用于基线以获得当前系统(本质上是迁移,但不是设计为增量运行)。更新或切换分支非常简单:核对数据库,加载转储以建立基线,运行脚本。
这个过程不像只运行几次迁移那么快,但也足够快了。你需要足够的时间去喝杯咖啡,但不足以去吃午饭。
巨大的优势在于,从破坏数据库开始意味着该过程完全独立于历史,因此不需要知道或关心交叉分支、回到过去或其他任何事情。
当您进行实时发布时,显然您的做法略有不同:您不会破坏数据库或加载转储,因为系统已经处于基线(基线被定义为实时系统的状态!)。您只需运行脚本即可。之后,创建一个新的转储以用作新的开发基线。
I think the whole idea of incremental migrations is pretty rotten, really. In a complex environment like yours, it really doesn't work. You could make it work for simple branch patterns, but for anything complicated, it will be a nightmare.
The system i'm working with now takes a different approach: we have no ability to make incremental migrations, but only to rebuild the database from a baseline. During initial development, that baseline was an empty database, and during maintenance, it's a copy of the live database (restored from a dump). We just have a pile of SQL and XML scripts that we apply to the baseline to get a current system (migrations, essentially, but not designed to be run incrementally). Updating or switching branches is then very simple: nuke the database, load a dump to establish the baseline, run the scripts.
This process is not as quick as just running a few migrations, but it's quick enough. It takes long enough that you can go and get a cup of coffee, but not long enough to get lunch.
The huge advantage is that starting by nuking the database means the process is completely history-independent, so it doesn't need to know or care about crossing branches, going back in time, or anything else.
When you take a release live, you obviously do things slightly differently: you don't nuke the database or load a dump, because the system is already at the baseline (the baseline is defined as the state of the live system!). You just run the scripts. And after that, make a fresh dump to be used as a new baseline for development.
我处于类似的情况,我在一个实时网站和几个开发分支上工作,需要更改数据库架构。
我通过编写一个可以与 git 很好地配合使用的 post-checkout 和 post-merge hook 解决了这个问题。我将所有迁移以 SQL 文件的形式存储在单独的目录中,并将它们与更改的 PHP 代码一起提交。每次我执行 a
或 a
git 都会自动调用适当的向上和向下迁移。请参阅 Github 上的实现。
作为一个特殊请求(对于那些不想点击 github 链接的人)一些更多的解释:
考虑以下场景。你有两个分支:
为了使新功能正常工作,它需要更改数据库架构。
工作流程如下:
当您在功能分支中更改代码时,需要更改
数据库模式,您还可以在迁移目录中提交两个新的 SQL 文件,
说:
20151120130200-extra-field-up.sql
(包含要迁移的所有 SQL 查询向上)
20151120130200-extra-field-down.sql
(包含要迁移的所有 SQL 查询向下)
..
的提交中查找所有 *-up.sql 脚本master..feature
的提交中查找所有 *-up.sql 脚本安装
只需将结帐后和/或合并后文件复制到.git/钩子
您自己的 git 存储库的目录。您可以编辑这些的配置部分
文件。请参阅文件本身以获取解释。
用法
迁移 SQL 文件的命名至关重要。他们应该以
up.sql
或down.sql
。名称的其余部分完全取决于您。但是,如果您有一个包含多个向上迁移和/或多个
向下迁移的执行顺序取决于
字典顺序。不同提交中的迁移文件将
始终以与提交相同(相反)的顺序调用。
并不要求您同时进行向上迁移和向下升级,
也不要求向上和向下迁移的名称相似。
I'm in a similar situation where I work on a live website and several development branches in which I need to change the database schema.
I solved it by writing a post-checkout and a post-merge hook that can be used nicely with git. I store all my migrations in the form of SQL files in a separate directory and commit them alongside the changed PHP code. Each time I perform a
or a
git will automatically call the appropriate up- and down-migrations. See my implementation on Github.
As a special request (for those of you who don't want to follow the github link) some more explanation:
Consider the following scenario. You have two branches:
For the new feature to work properly, it needs to change the database schema.
The workflow is as follows:
When, in you feature branch, you change your code which needs a change of the
database schema, you also commit two new SQL files in the migrations directory,
say:
20151120130200-extra-field-up.sql
(containing all the SQL queries to migrateupwards)
20151120130200-extra-field-down.sql
(containg all the SQL queries to migratedownwards)
<new HEAD>..<old HEAD>
<old HEAD>..<new HEAD>
master..feature
Install
Simply copy the post-checkout and/or post-merge file to the .git/hooks
directory of your own git repository. You can edit the config section of those
files. See the files themselves for an explanation.
Usage
The naming of the migration SQL files is crucial. They should end with
up.sql
ordown.sql
. The rest of the name is completely up to you.However if you have a single commit with multiple up-migrations and/or multiple
down-migrations the order in which they are performed depends on the
lexicographical order. Migration files that are within different commits, will
always be called in the same (reverse) order as the commits.
It's not a requirement that you have both an up-migration and a down-upgrade,
nor is it a requirement that up- and down-migrations are named similar.
我正在考虑在当前项目中测试的一种方法是创建一个分支“迁移”,并且所有(且仅)迁移都致力于此分支。开发人员必须在创建迁移之前从该分支合并到当前分支,以便始终在最新迁移之上创建他们的迁移。所有项目都从这个分支合并,这样每个分支都有一个线性迁移历史的概念。这使得每个分支都能够在数据库版本之间来回移动。当切换到依赖于不同数据库版本的分支时,开发人员会应用合适的迁移。
烦恼(除了将迁移提交到特殊分支的额外工作和勤奋之外)是记住哪个迁移对应于特定分支。我想这样做的一种方法是不是将迁移直接提交到迁移分支,而是将迁移(并且仅迁移)提交到当前分支,然后选择提交到迁移分支。然后,您可以查看上次将当前分支挑选到迁移分支上的时间,并知道该 diff 包含必要的迁移。我认为这是可能的。此外,开发人员可能只是为了查看需要进行哪些更改而创建迁移,然后尝试推断哪种迁移适合使用。
很抱歉提出了模糊的建议;如果我们最终尝试这种方法,我将用更具体的建议来编辑此建议。
An approach I'm thinking of testing in our current project is to create a branch 'migrations' and all (and only) migrations are committed to this branch. Developers must merge from this branch into their current branch before creating a migration so that their migration is always created on top of the latest migration. All projects merge from this branch, so that every branch has a concept of a linear migration history. This gives every branch the ability to move back and forth between database versions. When switching to a branch that depends on a different version of the database, the developer applies whichever migration is appropriate.
The annoyance (besides the extra work and diligence of committing migrations to the special branch) is remembering which migration corresponds to a particular branch. I guess one way of doing this is instead of committing migrations directly onto the migrations branch, commit the migration (and only the migration) onto the current branch and then cherry pick that commit onto the migrations branch. Then you can just look at the last time the current branch cherry-picked onto the migrations branch and know that that diff contains the necessary migration. I think that would be possible. Also, the developer might create a migration just to see what changes would be necessary, and then try to infer which migration would be appropriate to use.
Sorry for the vague suggestion; if we do end up trying this approach I'll edit this suggestion with more concrete recommendations.
这是我最近一直在做的事情。对我来说,问题并不是数据库模式本身存在分歧,而是 git 无法将它们合并在一起。涉及数据库模式的功能分支总是令人恐惧。
我一直在考虑的解决方案是,不要进行线性迁移,而是进行依赖于其他迁移的迁移。您将获得一个很好的迁移依赖图,该图很容易线性化(拓扑排序)。只需跟踪数据库中的命名迁移,并以正确的顺序执行尚未更新的更新。
例如,
addCustomerSalt
依赖于initialSchema
,separateAddress
依赖于person
。这没有解决的一个问题是,如果分支 A 依赖于在分支 B 中创建的更新 Z,但也许在这种情况下,您应该重新建立一个共同的祖先?
This is something I've been working on lately. For me the problem haven't been that database schema has diverged per se, but rather, git can't merge them together. Feature branches which touch the database schema are always scary.
The solution I've been thinking on is that instead of having linear migrations, have migrations that depend on other migrations. You get a nice dependency graph of your migrations which is easy enough to linearize (topological sort). Just keep track of the named migrations in your database and in the correct order execute the updates that aren't already updated.
For example,
addCustomerSalt
depends oninitialSchema
, andseparateAddress
depends onperson
.The one problem this does not solve is that if branch A depends on update Z which has been created in branch B, but maybe in that case, you should rebase to a common ancestor?