查找不存在满足特定条件的多对多子级的行
这是我正在尝试执行的操作的通用版本:
表 recipes
具有字段 id
和 name
。表 ingredients
具有字段 id
、name
和 sweetness
,描述了该成分的甜度,范围为1-10。食谱有很多成分,而且成分也存在于很多食谱中,因此两者在 ingredients_recipes
表中相关,其中包含字段 ingredient_id
和 recipe_id
。
找到含有甜度为 10 的成分的食谱很容易。
SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness = 10
但是,我很难否定该查询来查找不含甜度为 10 的成分的食谱。我的第一个想法是:
SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness != 10
但是,它会查找包含任何非甜味度 10 成分的食谱。
我的下一次尝试如下,这似乎有效:
SELECT * FROM recipes WHERE
(
SELECT count(*) FROM ingredients INNER JOIN recipes_ingredients ri ON
ri.ingredient_id = ingredients.id WHERE ingredients.sweetness = 10 AND
ri.recipe_id = recipes.id
) = 0
但是,我的一般经验是,与等效的、精心设计的 JOIN 相比,依赖子查询运行缓慢。我尝试过连接、分组等,但无法完全理解它,特别是因为,尽管 LEFT JOIN
和 IS NULL
似乎是正确的工具,有两个连接已经让事情变得很糟糕。很棒的 SQL 向导,我可以运行什么查询来获得最佳结果?谢谢!
Here's a generic version of what I'm trying to do:
The table recipes
has fields id
and name
. The table ingredients
has fields id
, name
, and sweetness
, describing how sweet that ingredient is on a scale of 1-10. Recipes have many ingredients and ingredients are in many recipes, so the two are related in a ingredients_recipes
table, with fields ingredient_id
and recipe_id
.
It's easy to find recipes that contain an ingredient with sweetness of 10.
SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness = 10
However, I'm having trouble with negating that query to find recipes with no ingredients with sweetness 10. My first thought was this:
SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness != 10
However, that finds recipes that contain any non-sweetness-10 ingredients.
My next attempt was the following, which seems to work:
SELECT * FROM recipes WHERE
(
SELECT count(*) FROM ingredients INNER JOIN recipes_ingredients ri ON
ri.ingredient_id = ingredients.id WHERE ingredients.sweetness = 10 AND
ri.recipe_id = recipes.id
) = 0
However, my general experience is that dependent subqueries run slowly compared to equivalent, well-crafted JOINs. I played around with joining, grouping, etc. but couldn't quite wrap my head around it, especially since, though it seems like LEFT JOIN
and IS NULL
were the proper tools, having two joins already made things nasty. Great SQL wizards, what query can I run to get the best results? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
试试这个:
Try this:
尝试:
它仍然是一个相关子查询,但
exists
和not Exists
有一些优化,应该使它们比原始查询执行得更好。对于直接连接解决方案,这应该可以工作:
根据索引,
不存在
解决方案可能会更快,因为不存在
在弄清楚后立即返回如果任何行满足给定条件,而无需查看表中不必要的内容。例如,如果它找到单行甜度为 10,它将停止查看表并返回 false。Try:
It's still a correlated subquery, but
exists
andnot exists
have some optimizations that should make them perform better than your original query.For a direct join solution, this should work:
Depending on indexing, the
not exists
solution could be faster asnot exists
returns immediately upon figuring out if any rows satisfy the given conditions without looking at any more of the table than necessary. For example, if it finds a single row of sweetness 10, it stops looking at the table and returns false.我玩弄了这里给我的答案(我已经投票了),并且从他们的灵感中,提出了一个查询,该查询似乎以令人惊讶的出色性能完成了这项工作:
与内部条件的连接(受到启发@Donnie)提出配方-成分组合,如果成分的甜度不是 10,则显示 NULL 行。然后我们按配方 ID 进行分组,并选择“最大”成分 ID。 (当且仅当没有实际 ID 可供选择时,
MAX
函数才会返回 null,即绝对没有与此配方关联的非甜度 10 项可供选择。)如果“ max”成分 ID 为 null,则 MAX 函数没有可供选择的 sweetness-10 项,因此,行HAVING
为 nullMAX(i.id)
被选中。我在禁用查询缓存器的情况下多次运行了
NOT EXISTS
版本的查询和上述版本的查询。对于大约 400 个配方,NOT EXISTS
查询始终需要大约 1.0 秒才能完成,而此查询的运行时间通常约为 0.1 秒。针对大约 5000 个食谱,NOT EXISTS
查询花费了大约 30 秒,而上述查询通常仍花费 0.1 秒,并且几乎总是低于 1.0。值得注意的是,检查每个表的解释后,这里列出的查询几乎能够完全在我给这些表的索引上运行,这可能解释了为什么它能够毫不费力地进行各种连接和分组。另一方面,NOT EXISTS 查询必须执行依赖子查询。如果这些索引没有到位,两者的性能可能会更加平等,但是当有机会使用原始连接时,查询优化器似乎非常强大。
这个故事的寓意是:格式良好的 JOIN 非常强大:) 谢谢大家!
I played around with the answers given me here (which I've since upvoted), and, from their inspiration, have come up with a query that seems to do the job with surprisingly outstanding performance:
The joins with the condition inside (inspired by @Donnie) bring out recipe-ingredient combinations, with NULL rows if the ingredient is not of sweetness 10. We then group by recipe ID, and select the "max" ingredient ID. (The
MAX
function will return null if and only if there are no actual IDs to select, i.e., there are absolutely no non-sweetness-10 items associated with this recipe to choose instead.) If that "max" ingredient ID is null, then there were no sweetness-10 items for the MAX function to select, and, therefore, rowsHAVING
a nullMAX(i.id)
are selected.I ran both the
NOT EXISTS
version of the query and the above version of the query a number of times with the query cacher disabled. Against about 400 recipes, theNOT EXISTS
query would consistently take about 1.0 seconds to complete, whereas this query's runtime was usually around 0.1 seconds. Against about 5000 recipes, theNOT EXISTS
query took about 30 seconds, whereas the above query usually still took 0.1 seconds, and was almost always under 1.0.It's worth noting that, checking EXPLAINs on each, the query listed here is able to run almost entirely on the indices I've given these tables, which probably explains why it is able to do all sorts of joining and grouping without batting an eye. The
NOT EXISTS
query, on the other hand, has to do dependent subqueries. The two might perform more equally if these indices weren't in place, but that query optimizer is pretty darn powerful when given the chance to use raw joins, it would seem.Moral of the story: well-formed JOINs are super-duper powerful :) Thanks, all!