使用触发器更新值
我正在尝试提高 SQL Server 2000 作业的性能。场景如下:
表A
的最大数量。 300,000 行。如果我更新/删除第 100 行(基于插入时间),则该行之后添加的所有行都应该更新它们的值。行号101,应根据行号更新其值。 100 和行号。 102 应该根据第 101 行的更新值来更新其值。例如
旧表:
ID...........Value
100..........220
101..........(220/2) = 110
102..........(110/2)=55
......................
第100行更新为新值:300。
新表
ID...........Value
100..........300
101..........(300/2) = 150
102..........(150/2)=75
......................
实际值计算更加复杂。该公式是为了简单起见。
现在,为更新/删除语句定义了触发器。当更新或删除行时,触发器会将该行的数据添加到日志表中。此外,更新/删除后会在代码隐藏中创建一个 SQL 作业,该作业会触发一个存储过程,最终迭代表 A
的所有下一行并更新它们的值。对于 300,000 行,该过程大约需要 10 天才能完成。
当 SP 被触发时,它会更新下一行的值。我认为这会导致触发器针对每个 SP 更新再次运行,并将这些行也添加到日志表中。此外,该任务应根据客户的要求在数据库端完成。
解决问题:
修改存储过程并直接从触发器中调用它。然后,存储过程删除触发器并更新下一行的值,然后再次创建触发器。
- 该程序将有多个实例同时运行。如果另一个用户在执行 SP 时修改一行,系统将不会触发触发器,我就会遇到麻烦!有什么解决方法吗?
- 您对此解决方案有何看法?有没有更好的方法来实现这一目标?
谢谢。
I'm trying to enhance the performance of a SQL Server 2000 job. Here's the scenario:
Table A
has max. of 300,000 rows. If I update/delete the 100th row (Based on the insertion time) all the rows which has been added after that row, should update their values. Row no. 101, should update its value based on row no. 100 and row no. 102 should update its value based on the row no.101's updated value. e.g.
Old Table:
ID...........Value
100..........220
101..........(220/2) = 110
102..........(110/2)=55
......................
Row No. 100 updated with new value: 300.
New Table
ID...........Value
100..........300
101..........(300/2) = 150
102..........(150/2)=75
......................
The actual values calculation is more complex. the formula is for simplicity.
Right now, a trigger is defined for update/delete statements. When a row is updated or deleted, the trigger adds the row's data to a log table. Also, a SQL Job is created in code-behind after update/delete which fires a stored procedure that finally, iterates through all the next rows of table A
and updates their values. The process takes ~10 days to be accomplished for 300,000 rows.
When the SP gets fired, it updates the next rows' values. I think this causes the trigger to run again for each SP update and add these rows to the log table too. Also, The task should be done in DB-side as requested by customer.
To solve the problem:
Modify the stored procedure and call it directly from the trigger. The stored procedure then drops the trigger and updates the next rows' values and then creates the trigger again.
- There will be multiple instances of the program running simultaneously. if another user modifies a row while the SP is being executed, the system will not fire the trigger and I'll be in trouble! Is there any workaround for this?
- What's your opinion about this solution? Is there any better way to achieve this?
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,关于更新过程。我明白,当涉及到更新下一行时,您的程序只是简单地调用自身。对于 300K 行,即使没有日志记录,这肯定不会很快(尽管完成的时间很可能要少得多)。但我绝对无法理解的是,如何能够在不达到最大嵌套级别的情况下更新超过 32 行。也许我的行动顺序错了。
无论如何,我可能会以不同的方式执行此操作,只使用一条指令:
语句的 OPTION (MAXDOP 1) 位将语句的并行度限制为 1,从而确保按顺序更新行每个值都基于前一个值,即基于具有前一 ID 值的行中的值。此外,
ID
列应设为聚集索引,当它成为主键时,通常默认情况下是聚集索引。更新过程的其他功能,即删除和重新创建触发器,可能应该由禁用和重新启用它来替换:
但是,您是说触发器实际上不应该被删除/禁用,因为多个用户可能同时更新表。
好吧,我们不触发扳机。
相反,我建议在表中添加一列特殊的列,用户不应该意识到这一列,或者至少不应该太在意,并且应该以某种方式确保永远不会触碰。该列只能通过“级联更新”过程进行更新。通过检查该列是否正在更新,您将知道是否应该调用更新过程和日志记录。
因此,在您的触发器中可能会出现类似这样的内容:
在
UpdateProc
中:您可能已经注意到这次的 UPDATE 语句略有不同。我知道,您的触发器是 FOR UPDATE(= AFTER UPDATE),这意味着 @id 行已经由用户更新。因此,该过程应该跳过它并从下一行开始,并且更新表达式现在可以只是公式。
总之,我想说的是,我的测试更新涉及表 300,000 行中的 299,995 行,在我的速度不太快的系统上大约花费了 3 秒。当然,没有日志记录,但我认为这应该让您对它的速度有一个基本的了解。
First, about the update process. I understand, your procedure is simply calling itself, when it comes to updating the next row. With 300K rows this is certainly not going to be very fast, even without logging (though it would most probably take much fewer days to accomplish). But what is absolutely beyond me is how it is possible to update more than 32 rows that way without reaching the maximum nesting level. Maybe I've got the sequence of actions wrong.
Anyway, I would probably do that differently, with just one instruction:
The
OPTION (MAXDOP 1)
bit of the statement limits the degree of parallelism for the statement to 1, thus making sure the rows are updated sequentially and every value is based on the previous one, i.e. on the value from the row with the preceding ID value. Also, theID
column should be made a clustered index, which it typically is by default, when it's made the primary key.The other functionality of the update procedure, i.e. dropping and recreating the trigger, should probably be replaced by disabling and re-enabling it:
But then, you are saying the trigger shouldn't actually be dropped/disabled, because several users might update the table at the same time.
All right then, we are not touching the trigger.
Instead I would suggest adding a special column to the table, the one the users shouldn't be aware of, or at least shouldn't care much of and should somehow be made sure never to touch. That column should only be updated by your 'cascading update' process. By checking whether that column was being updated or not you would know whether you should call the update procedure and the logging.
So, in your trigger there could be something like this:
In
UpdateProc
:You may have noticed that the UPDATE statement is slightly different this time. I understand, your trigger is FOR UPDATE (= AFTER UPDATE), which means that the @id row is already going to be updated by the user. So the procedure should skip it and start from the very next row, and the update expression can now be just the formula.
In conclusion I'd like to say that my test update involved 299,995 of my table's 300,000 rows and took approximately 3 seconds on my not so very fast system. No logging, of course, but I think that should give your the basic picture of how fast it can be.
这里有一个很大的理论问题。当更新一行需要更新 299,900 个其他行时,总是非常可疑。它表明数据模型存在严重缺陷。并不是说它永远不合适,只是需要它的频率远没有人们想象的那么频繁。当此类事情绝对必要时,通常会以批处理操作的方式完成。
在某些奇迹般的情况下,你所能期望的最好结果就是将这 10 天变成 10 分钟,但永远不会是 10 秒。我建议彻底解释为什么这似乎是必要的,以便可以探索另一种方法。
Big theoretical problem here. It is always extremely suspicious when updating one row REQUIRES updating 299,900 other rows. It suggests a deep flaw in the data model. Not that it is never appropriate, just that it is required far far less often than people think. When things like this are absolutely necessary, they are usually done as a batch operation.
The best you can hope for, in some miraculous situation, is to turn that 10 days into 10 minutes, but never even 10 seconds. I would suggest explaining thoroughly WHY this seems necessary, so that another approach can be explored.