在完整外连接查询中按日期过滤行 ->缺少一些结果

发布于 2024-12-12 14:34:45 字数 1670 浏览 0 评论 0原文

背景

我在 MySQL 中有两个表,其中包含不同类型的反馈项。我构建了一个查询,通过 FULL OUTER JOIN(实际上在 MySQL 中编写为两个联接和一个并集)来组合这些表,并计算一些平均成绩。这个查询似乎工作得很好:(

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

这在某种程度上简化了可读性,但在这里没有什么区别)

问题

接下来我尝试添加按日期过滤(即仅考虑在特定日期之后创建的反馈项目)。凭借我的 SQL 技能和所做的研究,我能够得出以下结论:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
  GROUP BY name)
ORDER BY name;

几乎有效:我得到的结果看起来是正确的。然而,缺少一些反馈项目。例如,设置一个月前的日期,我在数据库中统计了 21 个不同人的反馈,但这个查询只返回 19 个人。最糟糕的是我似乎找不到丢失的物品之间的任何相似之处。

我在这个查询中做错了什么吗?我认为 WHERE 子句在 JOIN 之后执行日期过滤,理想情况下我可能会在之前执行此操作。话又说回来,我不知道这是否会导致我的问题,而且我也不知道如何以不同的方式编写这个查询。

Background

I've got two tables with different types of feedback items in MySQL. I've built a query to combine these tables by FULL OUTER JOIN (which is actually written as two joins and an union in MySQL) and to count some average grades. This query seems to work perfectly:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

(This is somewhat simplified for readability but it doesn't make a difference here)

Problem

Next I tried adding filtering by date (i.e. only feedback items created after a certain date are taken in account). With my SQL skills and the research I did, I was able to come up with this:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  LEFT JOIN feedback_service AS s USING(name)
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM feedback_linguistic AS l
  RIGHT JOIN feedback_service AS s USING(name)
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
  GROUP BY name)
ORDER BY name;

This almost works: the results I get look about right. However, a couple of feedback items are missing. For example, setting the date one month ago, I counted feedback for 21 different people in the database, but this query only returns 19 people. The worst thing is that I can't seem to find any similarities between the missing items.

Am I doing something wrong in this query? I think that the WHERE clause does the date filtering after the JOIN and ideally I would probably be doing it before. Then again, I don't know if this causes my problem and I also have no idea how to write this query differently.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

删除会话 2024-12-19 14:34:45

我接受了约翰的回答,因为他很好地向我解释了这些东西,即使在更通用的意义上,这个答案也是有用的。但是,我想我也应该发布我到达的第一个解决方案。它使用子查询:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  LEFT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  RIGHT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

此查询的结果是正确的。然而,该解决方案看起来并不是最佳的,因为根据我的经验,子查询有时很慢。话又说回来,我没有做过任何性能分析,所以也许这里使用子查询不是瓶颈。无论如何,它在我的应用程序中运行得足够快。

I accepted Johans answer as he did a good job explaining this stuff to me and the answer is useful even in a more generic sense. However, I thought I'd also post the first solution I arrived to. It was using subqueries:

  (SELECT name, AVG(l.overallQuality) AS avgLingQual,
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  LEFT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  GROUP BY name)
UNION ALL
  (SELECT name, AVG(l.overallQuality) AS avgLingQual, 
    AVG(s.overallSatisfaction) AS avgSvcQual
  FROM (
    SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
  ) AS l
  RIGHT JOIN (
    SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
  ) AS s USING(name)
  WHERE l.id IS NULL
  GROUP BY name)
ORDER BY name;

The results are correct with this query. However, the solution doesn't really look optimal, as subqueries are sometimes slow in my experience. Then again, I haven't done any performance analysis, so maybe using subqueries here is not a bottleneck. In any case it worked fast enough in my application.

别在捏我脸啦 2024-12-19 14:34:45

完整外部联接是 3 个联接的组合:

1- A 和 B 之间的内部联接
2- A 和 B 之间的左排除连接
3- A 和 B 之间的右排除联接

请注意,内部联接和左排除联接的组合是左外联接,因此您通常将查询重写为 左外联接 + 右排除连接
然而,出于调试目的,union 所有 3 个连接并添加一些标记来确定哪个连接执行什么操作可能会很有用:

  /*inner join*/
  (SELECT
     'inner' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  INNER JOIN feedback_service s ON (l.name = s.name) 
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT
     'left exclusion' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  LEFT JOIN feedback_service s ON (l.name = s.name) 
  WHERE s.id IS NULL
    /*AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) */
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT 
     'right exclusion' as join_type
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual 
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  RIGHT JOIN feedback_service s ON (s.name = l.name) 
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    /*AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) */
  GROUP BY s.name) 
ORDER BY listname; 

我认为 WHERE 子句在 JOIN 之后执行日期过滤,理想情况下我可能会在 JOIN 之前执行此操作。

如果要先进行过滤,则将其放在 join 子句中。

A full outer join is a combination of 3 joins:

1- inner join between A and B
2- left exclusion join between A and B
3- right exclusion join between A and B

Note that the combination of an inner and a left exclusion join is a left outer join, so you normally rewrite the query as a left outer join + right exclusion join.
However for debugging purposes it can be useful to union all 3 joins and to add some marker as to which join does what:

  /*inner join*/
  (SELECT
     'inner' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  INNER JOIN feedback_service s ON (l.name = s.name) 
  WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT
     'left exclusion' as join_type 
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  LEFT JOIN feedback_service s ON (l.name = s.name) 
  WHERE s.id IS NULL
    /*AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) */
    AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) 
  GROUP BY l.name) 
UNION ALL
  (SELECT 
     'right exclusion' as join_type
     , COALESCE(s.name, l.name) as listname
     , AVG(l.overallQuality) AS avgLingQual 
     , AVG(s.overallSatisfaction) AS avgSvcQual 
  FROM feedback_linguistic l 
  RIGHT JOIN feedback_service s ON (s.name = l.name) 
  WHERE l.id IS NULL
    AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) 
    /*AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) */
  GROUP BY s.name) 
ORDER BY listname; 

I think that the WHERE clause does the date filtering after the JOIN and ideally I would probably be doing it before.

If you want to do the filtering before, then put it in the join clause.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文