在mysql中,NOT EXISTS函数是否比UNION的性能成本更高?
我想用可能重叠的数据填充表格。我在代码中这样做了几次。所以,第一次
INSERT INTO A (SELECT * FROM B)
,然后第二次,
INSERT INTO A
SELECT * FROM C
WHERE NOT EXISTS (SELECT * FROM A Where A.field = C.field)
重新插入,总共完成了3次。我可以将代码更改为在 4 个子表上使用 UNION 进行一次插入,但这需要大量的重新设计,所以我想知道我现在收到了多少性能损失
谢谢!
I want to fill a table with data that might be overlapping. I do this a few times in the code. so, the first time it is
INSERT INTO A (SELECT * FROM B)
and then the second time it is
INSERT INTO A
SELECT * FROM C
WHERE NOT EXISTS (SELECT * FROM A Where A.field = C.field)
the re-insertion is done a total of 3 times. I could change the code to just one insertion using a UNION on the 4 sub-tables, but it will require alot of re-design, so I wanted to know how much performance penalty I am receving right now
Thanks!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
建立工会的成本可能会更低。但与数据库一样,这取决于。
为什么?
因为你现在正在做的是这样的:
使用联合,您将执行以下操作:
即,您当前的查询扫描表 B、C、D 和表 A 两次,加上三个单独查询的开销。联合查询将扫描表 B、C、D,并对行进行排序(以获取唯一值),然后将它们插入到表 A 中。乍一看,联合查询似乎会更快,因为您少做了两次扫描,并且只有一个插入件(因此锁定较少)。
我所说的这取决于是:
索引:正确索引,查找可能比从 B、C 和 D 中对数据进行排序更快。
数据量:如果您的数据很少数据与运行数据库的硬件相比,那么这个讨论可能毫无意义。也就是说,优化不是瓶颈的部分是没有意义的。
A union is probably less costly. But as usual with databases, it depends.
Why?
Because what you are doing right now is this:
With a union you would be doing this:
I.e. your current queries scans table B, C, D and table A twice plus overhead for three separate queries. A union-query would scan table B, C, D, and sort the rows (to get unique values) and then insert them into table A. At a first glance it seems that union would be faster because you are doing two less scans and only one insert (and thus less locking).
What I mean by it depends is:
Indexes: Correctly indexed, seeks might be faster than sorting the data from B, C and D.
Amount of data: If you have little data compared to the hardware you are running the database on then this discussion might be pointless. I.e. there's no point in optimizing parts that are not bottlenecks.
如果目标表中有合适的 UNIQUE KEY 约束,您可能还需要查看 INSERT IGNORE ...。
只是另一个想法......不过不知道性能差异,可能取决于您的表结构、索引和数据。我会自己对不同的方法进行基准测试并选择最合适的一种。
You might want to look at
INSERT IGNORE ...
also if you have a suitable UNIQUE KEY constraint in the destination table.Just another idea... No idea about the performance differences though, probably depends on your table structures, indexes and data. I would just benchmark the different methods myself and choose most suitable one.