更改表架构时的错误处理和数据完整性
我们有一些拥有大型数据集的客户,在升级过程中,我们需要修改各种表的架构(添加一些列,重命名其他列,偶尔更改数据类型,但这种情况很少见)。
以前,我们一直在使用具有新架构的临时表,然后删除原始表并重命名临时表,但我希望通过使用 ALTER table ...
来显着加快速度。
我的问题是我需要考虑哪些数据完整性和错误处理问题? 我是否应该将对表的所有更改包含在事务中(如果是,如何?),或者 DBMS 是否会保证 ALTER 操作的原子性和完整性?
我们已经强烈建议客户在开始升级之前备份其数据,因此这应该始终是一个后备选项。
我们需要针对 SQLServer 2005 和 Oracle,但显然如果它们需要不同的方法,我可以添加条件代码。
We have a few customers with large data sets and during our upgrade procedure we need to modify the schema of various tables (adding some columns, renaming others, occasionally changing data types, but that's rare).
Previously we've been going via a temporary table with the new schema, and then dropping the original and renaming the temp table but I'm hoping to speed that up dramatically by using ALTER table ...
instead.
My question is what data integrity and error handling issues do I need to consider? Should I enclose all changes to a table in a transaction (and if so, how?) or will the DBMS guarantee atomicity and integrity over an ALTER operation?
We already heavily recommend customers backup their data before starting the upgrade so that should always be a fall back option.
We need to target SQLServer 2005 and Oracle, but obviously I can add conditional code if they require different approaches.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
仅适用于 Oracle 的注释:
表更改属于 DDL,因此事务的概念不适用 - 每个 DDL 语句都会在操作期间锁定表,并且要么成功,要么失败。
添加(可为空!)列或重命名现有列是一个相对轻量级的过程,如果可以获得表锁,则不会出现任何问题。
添加(可为空!
如果您要添加/修改约束(NOT NULL 或其他更复杂的检查约束),Oracle 将检查现有数据以验证约束,除非您将 ENABLE NOVALIDATE 子句添加到约束 DDL。 对于大型表,现有数据的验证可能是一个漫长的过程。
如果您正在编写升级脚本以作为 SQL*Plus 脚本运行,请使用“whenever sqlerror exit sql.sqlcode”指令在第一次审核失败时中止脚本,这样可以省去很多麻烦。部分实现的升级更容易。
如果必须在无法控制事务或无法错过事务的实时系统上执行升级,请考虑使用 Oracle DBMS_REDEFINITION 包,该包会自动创建临时表和触发器的临时配置,以捕获正在进行的事务,同时在“背景”中重新定义表格。 警告 - 此选项需要大量工作和陡峭的学习曲线。
Comments for Oracle only:
Table alterations are DDL, so the concept of a transaction doesn't apply - every DDL statement locks the table for the duration of the operation and either succeeds or fails.
Adding (nullable!) columns or renaming existing columns is a relatively lightweight process and shouldn't present any problems if the table lock can be acquired.
If you're adding/modifying constraints (either NOT NULL or other more complex check constraints) Oracle will check existing data to validate the constraints unless you add the ENABLE NOVALIDATE clause to the constraint DDL. The validation of existing data can be a lengthy process for large tables.
If you're scripting the upgrade to be run as a SQL*Plus script, save yourself a lot of headaches by using the "whenever sqlerror exit sql.sqlcode" directive to abort the script on the first failure to make the review of partially implemented upgrades easier.
If the upgrade must be performed on a live system where you can neither control transactions or afford to miss them, consider using the Oracle DBMS_REDEFINITION package, which automatically creates a temporary configuration of temp tables and triggers to capture in-flight transactions while redefining the table in the "background". Warning - lots of work and a steep learning curve for this option.
如果您使用的是 SQL Server,则 ddl 语句是事务性的,因此请包装在事务中(但我认为这不适用于 Oracle)。
我们将升级分为具有特定功能的单独补丁。 应用了哪些补丁位于
database_patch_history
表中,可以轻松查看应用了哪些补丁以及如何回滚它们。正如您所说,开始之前进行备份很重要。
If you're using SQL Server then ddl statements are transactional, so wrap in a transaction (I don't think this applies to Oracle though).
We split upgrades into individual patches that go with a particular feature. Which patches are applied go in a
database_patch_history
table, and it's easy to see which patches were applied and how to roll them back.As you say, taking a backup before you start is important.
我过去曾不得不进行这样的更改,并且一直对数据丢失非常偏执。 为了帮助减轻这种风险,我总是对“沙箱”数据库进行大量测试,这些数据库尽可能接近地镜像模式和数据中的目标数据库。 在推出该流程之前尽可能多地对其进行测试,就像对应用程序的任何其他区域进行测试一样。
I have had to do changes like this in the past and have always been very paranoid about data loss. To help mitigate that risk I have always done tons of testing against "sandbox" databases that mirrored the target databases in schema and data as closely as possible. Test out the process as much as possible before rolling it out, just like you would any other area of the application.
如果您大幅更改列的任何数据类型,例如将 VARCHAR 更改为 INT,DBMS 将出现紧急情况,您可能会丢失该数据。 幸运的是,现在的 DBMS 足够智能,可以在不丢失数据的情况下进行一些数据类型转换,但您不想在进行更改时冒损坏任何数据的风险。
您不应该通过重命名列来丢失任何数据,并且绝对不会通过添加新列来丢失任何数据,这是当您移动数据时您必须担心的。
首先,备份整个表,包括架构和数据,这样您就可以立即回滚到以前的架构。 其次,看看你试图做出的改变,看看它们有多大——试着找出到底需要改变什么。 如果您要进行数据类型转换,请先将该数据推送到包含 3 列的中间表、外键(id 或任何可以定位行的内容)、旧数据和新列。 然后,要么直接将旧数据推送到新列,要么在应用程序级别进行转换。
当所有内容都是正确的类型并且一切都成功时,运行 ALTER 语句并重新填充数据库! 做起来很简单,只需要逻辑思维过程。
If you dramatically change any data types of columns, for instance change a VARCHAR to an INT, the DBMS will panic and you will probably loose that data. Luckily, nowadays DBMSs are intelligent enough to do some data type conversions without loosing the data, but you don't want to run the risk of damaging any of it when making the alterations.
You shouldn't loose any data by renaming columns and definitely won't by adding new columns, it's when you move the data about that you have to be concerned.
Firstly, backup the entire table, both the schema and data, so at a second's notice you can roll back to the previous schema. Secondly, look at the alterations you are trying to make, see how drastic they are - try to figure out exactly what needs to change. If you're making datatype conversions push that data to an intermediatery table first with 3 columns, the foreign key (id or whatever so you can locate the row), the old data and the new column. Then either push the old data to the new column directly, or convert it at the application-level.
When it's all in the correct types and everything's been successful, run the ALTER statements and repopulate the database! It's simple enough to do, just needs a logical thought process.