在完整外连接查询中按日期过滤行 ->缺少一些结果
背景
我在 MySQL 中有两个表,其中包含不同类型的反馈项。我构建了一个查询,通过 FULL OUTER JOIN(实际上在 MySQL 中编写为两个联接和一个并集)来组合这些表,并计算一些平均成绩。这个查询似乎工作得很好:(
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
LEFT JOIN feedback_service AS s USING(name)
GROUP BY name)
UNION ALL
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
RIGHT JOIN feedback_service AS s USING(name)
WHERE l.id IS NULL
GROUP BY name)
ORDER BY name;
这在某种程度上简化了可读性,但在这里没有什么区别)
问题
接下来我尝试添加按日期过滤(即仅考虑在特定日期之后创建的反馈项目)。凭借我的 SQL 技能和所做的研究,我能够得出以下结论:
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
LEFT JOIN feedback_service AS s USING(name)
WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
GROUP BY name)
UNION ALL
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
RIGHT JOIN feedback_service AS s USING(name)
WHERE l.id IS NULL
AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
GROUP BY name)
ORDER BY name;
这几乎有效:我得到的结果看起来是正确的。然而,缺少一些反馈项目。例如,设置一个月前的日期,我在数据库中统计了 21 个不同人的反馈,但这个查询只返回 19 个人。最糟糕的是我似乎找不到丢失的物品之间的任何相似之处。
我在这个查询中做错了什么吗?我认为 WHERE
子句在 JOIN
之后执行日期过滤,理想情况下我可能会在之前执行此操作。话又说回来,我不知道这是否会导致我的问题,而且我也不知道如何以不同的方式编写这个查询。
Background
I've got two tables with different types of feedback items in MySQL. I've built a query to combine these tables by FULL OUTER JOIN
(which is actually written as two joins and an union in MySQL) and to count some average grades. This query seems to work perfectly:
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
LEFT JOIN feedback_service AS s USING(name)
GROUP BY name)
UNION ALL
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
RIGHT JOIN feedback_service AS s USING(name)
WHERE l.id IS NULL
GROUP BY name)
ORDER BY name;
(This is somewhat simplified for readability but it doesn't make a difference here)
Problem
Next I tried adding filtering by date (i.e. only feedback items created after a certain date are taken in account). With my SQL skills and the research I did, I was able to come up with this:
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
LEFT JOIN feedback_service AS s USING(name)
WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
GROUP BY name)
UNION ALL
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
RIGHT JOIN feedback_service AS s USING(name)
WHERE l.id IS NULL
AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
GROUP BY name)
ORDER BY name;
This almost works: the results I get look about right. However, a couple of feedback items are missing. For example, setting the date one month ago, I counted feedback for 21 different people in the database, but this query only returns 19 people. The worst thing is that I can't seem to find any similarities between the missing items.
Am I doing something wrong in this query? I think that the WHERE
clause does the date filtering after the JOIN
and ideally I would probably be doing it before. Then again, I don't know if this causes my problem and I also have no idea how to write this query differently.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我接受了约翰的回答,因为他很好地向我解释了这些东西,即使在更通用的意义上,这个答案也是有用的。但是,我想我也应该发布我到达的第一个解决方案。它使用子查询:
此查询的结果是正确的。然而,该解决方案看起来并不是最佳的,因为根据我的经验,子查询有时很慢。话又说回来,我没有做过任何性能分析,所以也许这里使用子查询不是瓶颈。无论如何,它在我的应用程序中运行得足够快。
I accepted Johans answer as he did a good job explaining this stuff to me and the answer is useful even in a more generic sense. However, I thought I'd also post the first solution I arrived to. It was using subqueries:
The results are correct with this query. However, the solution doesn't really look optimal, as subqueries are sometimes slow in my experience. Then again, I haven't done any performance analysis, so maybe using subqueries here is not a bottleneck. In any case it worked fast enough in my application.
完整外部联接是 3 个联接的组合:
1- A 和 B 之间的内部联接
2- A 和 B 之间的左排除连接
3- A 和 B 之间的右排除联接
请注意,内部联接和左排除联接的组合是左外联接,因此您通常将查询重写为
左外联接
+右排除连接
。然而,出于调试目的,
union
所有 3 个连接并添加一些标记来确定哪个连接执行什么操作可能会很有用:如果要先进行过滤,则将其放在 join 子句中。
A full outer join is a combination of 3 joins:
1- inner join between A and B
2- left exclusion join between A and B
3- right exclusion join between A and B
Note that the combination of an inner and a left exclusion join is a left outer join, so you normally rewrite the query as a
left outer join
+right exclusion join
.However for debugging purposes it can be useful to
union
all 3 joins and to add some marker as to which join does what:If you want to do the filtering before, then put it in the join clause.