MySQL 规范化存储过程性能

发布于 2024-09-06 04:30:51 字数 592 浏览 6 评论 0原文

我在 MySQL 中编写了一个存储过程来获取表中当前的值并对它们进行“标准化”。这意味着对于传递给存储过程的每个值，它都会检查该值是否已在表中。如果是，则将该行的 id 存储在变量中。如果该值不在表中，则它存储新插入的值的 id。然后，存储过程获取 id 并将它们插入到与原始非规范化表等效的表中，但该表已完全规范化并且主要由外键组成。

我对这种设计的问题是，存储过程大约需要 10 毫秒才能返回，当您尝试处理大约 1000 万条记录时，这个时间太长了。我怀疑性能与我插入的方式有关。即

INSERT INTO TableA 
 (first_value) 
VALUES 
 (argument_from_sp) ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id);

SET @TableAId = LAST_INSERT_ID();

“ON DUPLICATE KEY UPDATE”有点像黑客，因为在重复键上我不想更新任何内容，而只是返回行的 id 值。如果您错过了这一步，那么当您尝试运行“SET ...”语句时，LAST_INSERT_ID() 函数将返回错误的值。

有谁知道在 MySQL 中执行此操作的更好方法吗？

原文

I've written a stored procedure in MySQL to take values currently in a table and to "Normalize" them. This means that for each value passed to the stored procedure, it checks whether the value is already in the table. If it is, then it stores the id of that row in a variable. If the value is not in the table, it stores the newly inserted value's id. The stored procedure then takes the id's and inserts them into a table which is equivalent to the original de-normailized table, but this table is fully normalized and consists of mainly foreign keys.

My problem with this design is that the stored procedure takes approximately 10ms or so to return, which is too long when you're trying to work through some 10million records. My suspicion is that the performance is to do with the way in which I'm doing the inserts. i.e.

INSERT INTO TableA 
 (first_value) 
VALUES 
 (argument_from_sp) ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id);

SET @TableAId = LAST_INSERT_ID();

The "ON DUPLICATE KEY UPDATE" is a bit of a hack, due to the fact that on a duplicate key I don't want to update anything but rather just return the id value of the row. If you miss this step though, the LAST_INSERT_ID() function returns the wrong value when you're trying to run the "SET ..." statement.

Does anyone know of a better way to do this in MySQL?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云仙小弟 2024-09-13 04:30:51

我返回并创建了一个函数来处理这种情况：

CREATE DEFINER=`root`@`%` FUNCTION `value_update`(inValue VARCHAR(255)) RETURNS int(11)
BEGIN
        DECLARE outId INT;
        SELECT valueId INTO outId FROM ValuesTable WHERE value = inValue;

        IF outId IS NULL THEN
                INSERT INTO ValuesTable (value) VALUES (inValue);
                SELECT LAST_INSERT_ID() INTO outId;
        END IF;

        RETURN outId;
END

前面提到的存储过程调用这些函数而不是执行 INSERT 语句本身。就性能而言，上述函数在我的设置中速度更快（使用 ndb 表类型）。另外，在对应用程序的所有部分进行基准测试后，我发现这导致的性能问题只是整体性能瓶颈的一小部分。

I've gone back and created a function to handle this case instead:

CREATE DEFINER=`root`@`%` FUNCTION `value_update`(inValue VARCHAR(255)) RETURNS int(11)
BEGIN
        DECLARE outId INT;
        SELECT valueId INTO outId FROM ValuesTable WHERE value = inValue;

        IF outId IS NULL THEN
                INSERT INTO ValuesTable (value) VALUES (inValue);
                SELECT LAST_INSERT_ID() INTO outId;
        END IF;

        RETURN outId;
END

The stored procedure mentioned earlier calls these functions instead of doing the INSERT statements itself. Performance-wise, the above function is faster in my setup (using ndb table type). Also, after benchmarking all of the parts of my application I've found that the performance issues this was causing were only a minor part of the overall performance bottleneck.

回复收藏 0 原文