MySQL 规范化存储过程性能
我在 MySQL 中编写了一个存储过程来获取表中当前的值并对它们进行“标准化”。这意味着对于传递给存储过程的每个值,它都会检查该值是否已在表中。如果是,则将该行的 id 存储在变量中。如果该值不在表中,则它存储新插入的值的 id。然后,存储过程获取 id 并将它们插入到与原始非规范化表等效的表中,但该表已完全规范化并且主要由外键组成。
我对这种设计的问题是,存储过程大约需要 10 毫秒才能返回,当您尝试处理大约 1000 万条记录时,这个时间太长了。我怀疑性能与我插入的方式有关。即
INSERT INTO TableA
(first_value)
VALUES
(argument_from_sp) ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id);
SET @TableAId = LAST_INSERT_ID();
“ON DUPLICATE KEY UPDATE”有点像黑客,因为在重复键上我不想更新任何内容,而只是返回行的 id 值。如果您错过了这一步,那么当您尝试运行“SET ...”语句时,LAST_INSERT_ID() 函数将返回错误的值。
有谁知道在 MySQL 中执行此操作的更好方法吗?
I've written a stored procedure in MySQL to take values currently in a table and to "Normalize" them. This means that for each value passed to the stored procedure, it checks whether the value is already in the table. If it is, then it stores the id of that row in a variable. If the value is not in the table, it stores the newly inserted value's id. The stored procedure then takes the id's and inserts them into a table which is equivalent to the original de-normailized table, but this table is fully normalized and consists of mainly foreign keys.
My problem with this design is that the stored procedure takes approximately 10ms or so to return, which is too long when you're trying to work through some 10million records. My suspicion is that the performance is to do with the way in which I'm doing the inserts. i.e.
INSERT INTO TableA
(first_value)
VALUES
(argument_from_sp) ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id);
SET @TableAId = LAST_INSERT_ID();
The "ON DUPLICATE KEY UPDATE" is a bit of a hack, due to the fact that on a duplicate key I don't want to update anything but rather just return the id value of the row. If you miss this step though, the LAST_INSERT_ID() function returns the wrong value when you're trying to run the "SET ..." statement.
Does anyone know of a better way to do this in MySQL?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我返回并创建了一个函数来处理这种情况:
前面提到的存储过程调用这些函数而不是执行 INSERT 语句本身。就性能而言,上述函数在我的设置中速度更快(使用 ndb 表类型)。另外,在对应用程序的所有部分进行基准测试后,我发现这导致的性能问题只是整体性能瓶颈的一小部分。
I've gone back and created a function to handle this case instead:
The stored procedure mentioned earlier calls these functions instead of doing the INSERT statements itself. Performance-wise, the above function is faster in my setup (using ndb table type). Also, after benchmarking all of the parts of my application I've found that the performance issues this was causing were only a minor part of the overall performance bottleneck.
如果已经有了唯一标识符,还需要自增主键吗?
If you already have a unique identifier, is there any need to have an auto-incrementing primary key?