SQL 性能:使用联合和子查询
你好,stackoverflow(我的第一个问题!),
我们正在做一些类似 SNS 的事情,并且有一个关于优化查询的问题。
使用 mysql 5.1,创建当前表:
CREATE TABLE friends(
user_id BIGINT NOT NULL,
friend_id BIGINT NOT NULL,
PRIMARY KEY (user_id, friend_id)
) ENGINE INNODB;
示例数据填充如下:
INSERT INTO friends VALUES
(1,2),
(1,3),
(1,4),
(1,5),
(2,1),
(2,3),
(2,4),
(3,1),
(3,2),
(4,1),
(4,2),
(5,1),
(5,6),
(6,5),
(7,8),
(8,7);
业务逻辑:我们需要找出给定用户的哪些用户是朋友或朋友的朋友。 对于 user_id=1 的用户,当前的查询是:
SELECT friend_id FROM friends WHERE user_id = 1
UNION
SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
);
预期结果是(顺序无关紧要):
2
3
4
5
1
6
如您所见,上面的查询执行子查询“SELECTfriend_id FROM Friends WHERE user_id = 1”两次。
所以,问题来了。如果性能是您最关心的问题,您将如何更改上述查询或架构?
提前致谢。
Hi stackoverflow(My first question!),
We're doing something like an SNS, and got a question about optimizing queries.
Using mysql 5.1, the current table was created with:
CREATE TABLE friends(
user_id BIGINT NOT NULL,
friend_id BIGINT NOT NULL,
PRIMARY KEY (user_id, friend_id)
) ENGINE INNODB;
Sample data is populated like:
INSERT INTO friends VALUES
(1,2),
(1,3),
(1,4),
(1,5),
(2,1),
(2,3),
(2,4),
(3,1),
(3,2),
(4,1),
(4,2),
(5,1),
(5,6),
(6,5),
(7,8),
(8,7);
The business logic: we need to figure out which users are friends or friends of friends for a given user.
The current query for this for a user with user_id=1 is:
SELECT friend_id FROM friends WHERE user_id = 1
UNION
SELECT DISTINCT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id = 1
);
The expected result is(order doesn't matter):
2
3
4
5
1
6
As you can see, the above query performs the subquery "SELECT friend_id FROM friends WHERE user_id = 1" twice.
So, here is the question. If performance is your primary concern, how would you change the above query or schema?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在这种特殊情况下,您可以使用 JOIN:
检查每个查询表明
JOIN
在大 O 意义上与UNION
性能差不多,尽管可能更快常数因子。 Jasie 的查询看起来可能更快。In this particular case, you can use a JOIN:
Examining each query suggests the
JOIN
will about as performant as theUNION
in a big-O sense, though perhaps faster by a constant factor. Jasie's query looks like it might be big-O faster.不需要
UNION
。只需包含与初始用户的user_id
的OR
即可:No need for the
UNION
. Just include anOR
with theuser_id
of the beginning user: