使用多个左连接来计算平均值和计数
我试图弄清楚如何使用多个左外连接来计算平均分数和卡片数量。我有以下架构和测试数据。每副牌有 0 个或更多分数和 0 个或更多卡片。我需要计算每副牌的平均分数和牌数。为了方便起见,我使用 mysql,我最终希望它能在 Android 手机上的 sqlite 上运行。
mysql> select * from deck; +----+-------+ | id | name | +----+-------+ | 1 | one | | 2 | two | | 3 | three | +----+-------+
mysql> select * from score; +---------+-------+---------------------+--------+ | scoreId | value | date | deckId | +---------+-------+---------------------+--------+ | 1 | 6.58 | 2009-10-05 20:54:52 | 1 | | 2 | 7 | 2009-10-05 20:54:58 | 1 | | 3 | 4.67 | 2009-10-05 20:55:04 | 1 | | 4 | 7 | 2009-10-05 20:57:38 | 2 | | 5 | 7 | 2009-10-05 20:57:41 | 2 | +---------+-------+---------------------+--------+
mysql> select * from card; +--------+-------+------+--------+ | cardId | front | back | deckId | +--------+-------+------+--------+ | 1 | fron | back | 2 | | 2 | fron | back | 1 | | 3 | f1 | b2 | 1 | +--------+-------+------+--------+
我运行以下查询......
mysql> select deck.name, sum(score.value)/count(score.value) "Ave", -> count(card.front) "Count" -> from deck -> left outer join score on deck.id=score.deckId -> left outer join card on deck.id=card.deckId -> group by deck.id; +-------+-----------------+-------+ | name | Ave | Count | +-------+-----------------+-------+ | one | 6.0833333333333 | 6 | | two | 7 | 2 | | three | NULL | 0 | +-------+-----------------+-------+
我得到了平均值的正确答案,但卡片数量的错误答案。在我拔掉头发之前,有人能告诉我我做错了什么吗?
谢谢!
约翰
I am trying to figure out how to use multiple left outer joins to calculate average scores and number of cards. I have the following schema and test data. Each deck has 0 or more scores and 0 or more cards. I need to calculate an average score and card count for each deck. I'm using mysql for convenience, I eventually want this to run on sqlite on an Android phone.
mysql> select * from deck; +----+-------+ | id | name | +----+-------+ | 1 | one | | 2 | two | | 3 | three | +----+-------+
mysql> select * from score; +---------+-------+---------------------+--------+ | scoreId | value | date | deckId | +---------+-------+---------------------+--------+ | 1 | 6.58 | 2009-10-05 20:54:52 | 1 | | 2 | 7 | 2009-10-05 20:54:58 | 1 | | 3 | 4.67 | 2009-10-05 20:55:04 | 1 | | 4 | 7 | 2009-10-05 20:57:38 | 2 | | 5 | 7 | 2009-10-05 20:57:41 | 2 | +---------+-------+---------------------+--------+
mysql> select * from card; +--------+-------+------+--------+ | cardId | front | back | deckId | +--------+-------+------+--------+ | 1 | fron | back | 2 | | 2 | fron | back | 1 | | 3 | f1 | b2 | 1 | +--------+-------+------+--------+
I run the following query...
mysql> select deck.name, sum(score.value)/count(score.value) "Ave", -> count(card.front) "Count" -> from deck -> left outer join score on deck.id=score.deckId -> left outer join card on deck.id=card.deckId -> group by deck.id; +-------+-----------------+-------+ | name | Ave | Count | +-------+-----------------+-------+ | one | 6.0833333333333 | 6 | | two | 7 | 2 | | three | NULL | 0 | +-------+-----------------+-------+
... and I get the right answer for the average, but the wrong answer for the number of cards. Can someone tell me what I am doing wrong before I pull my hair out?
Thanks!
John
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
它正在运行您所要求的内容 - 将卡 2 和卡 3 连接到分数 1、2 和 3 - 创建计数 6 (2 * 3)。在卡 1 的情况下,它连接到分数 4 和 5,创建计数 2 (1 * 2)。
如果您只想计算卡片数量(就像您当前所做的那样),请使用 COUNT(Distinct Card.CardId)。
It's running what you're asking--it's joining card 2 and 3 to scores 1, 2, and 3--creating a count of 6 (2 * 3). In card 1's case, it joins to scores 4 and 5, creating a count of 2 (1 * 2).
If you just want a count of cards, like you're currently doing, COUNT(Distinct Card.CardId).
[编辑]
sql有内置的平均函数,只需使用这个:
[EDIT]
sql has built-in average function, just use this:
问题在于您正在
score
笛卡尔积 > 和卡
。它的工作原理如下:当您将
deck
加入到score
时,可能会有多行匹配。然后,这些多行中的每行都会连接到卡中的所有匹配行。没有条件阻止这种情况发生,并且在没有条件限制时的默认联接行为是将一个表中的所有行联接到另一表中的所有行。要查看其实际效果,请尝试此查询,无需使用分组依据:
您将在来自
score
和card
的列中看到大量重复数据。当您对包含重复的数据计算AVG()
时,冗余值会神奇地消失(只要这些值均匀重复)。但是当您使用COUNT()
或SUM()
计算时,总数会相差很大。对于无意的笛卡尔积可能有补救措施。对于您的情况,您可以使用
COUNT(DISTINCT)
进行补偿:此解决方案无法解决所有无意的笛卡尔积的情况。更通用的解决方案是将其分解为两个单独的查询:
并非数据库编程中的每个任务都必须在单个查询中完成。当您需要多个统计信息时,使用单独的查询甚至可以更高效(并且更简单、更容易修改且不易出错)。
What's going wrong is that you're creating a Cartesian product between
score
andcard
.Here's how it works: when you join
deck
toscore
, you may have multiple rows match. Then each of these multiple rows is joined to all of the matching rows incard
. There's no condition preventing that from happening, and the default join behavior when no condition restricts it is to join all rows in one table to all rows in another table.To see it in action, try this query, without the group by:
You'll see a lot of repeated data in the columns that come from
score
andcard
. When you calculate theAVG()
over data that has repeats in it, the redundant values magically disappear (as long as the values are repeated uniformly). But when youCOUNT()
orSUM()
them, the totals are way off.There may be remedies for inadvertent Cartesian products. In your case, you can use
COUNT(DISTINCT)
to compensate:This solution doesn't solve all cases of inadvertent Cartesian products. The more general-purpose solution is to break it up into two separate queries:
Not every task in database programming must be done in a single query. It can even be more efficient (as well as simpler, easier to modify, and less error-prone) to use individual queries when you need multiple statistics.
在我看来,使用左连接并不是一个好方法。下面是一个标准 SQL 查询,可得到您想要的结果。
Using left joins isn't a good approach, in my opinion. Here's a standard SQL query for the result you want.