查找不存在满足特定条件的多对多子级的行

发布于 2024-10-03 05:38:34 字数 1336 浏览 6 评论 0原文

这是我正在尝试执行的操作的通用版本：

表 recipes 具有字段 id 和 name。表 ingredients 具有字段 id、name 和 sweetness，描述了该成分的甜度，范围为1-10。食谱有很多成分，而且成分也存在于很多食谱中，因此两者在 ingredients_recipes 表中相关，其中包含字段 ingredient_id 和 recipe_id。

找到含有甜度为 10 的成分的食谱很容易。

SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness = 10

但是，我很难否定该查询来查找不含甜度为 10 的成分的食谱。我的第一个想法是：

SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness != 10

但是，它会查找包含任何非甜味度 10 成分的食谱。

我的下一次尝试如下，这似乎有效：

SELECT * FROM recipes WHERE
(
  SELECT count(*) FROM ingredients INNER JOIN recipes_ingredients ri ON
  ri.ingredient_id = ingredients.id WHERE ingredients.sweetness = 10 AND
  ri.recipe_id = recipes.id
) = 0

但是，我的一般经验是，与等效的、精心设计的 JOIN 相比，依赖子查询运行缓慢。我尝试过连接、分组等，但无法完全理解它，特别是因为，尽管 LEFT JOIN 和 IS NULL 似乎是正确的工具，有两个连接已经让事情变得很糟糕。很棒的 SQL 向导，我可以运行什么查询来获得最佳结果？谢谢！

原文

Here's a generic version of what I'm trying to do:

The table recipes has fields id and name. The table ingredients has fields id, name, and sweetness, describing how sweet that ingredient is on a scale of 1-10. Recipes have many ingredients and ingredients are in many recipes, so the two are related in a ingredients_recipes table, with fields ingredient_id and recipe_id.

It's easy to find recipes that contain an ingredient with sweetness of 10.

SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness = 10

However, I'm having trouble with negating that query to find recipes with no ingredients with sweetness 10. My first thought was this:

SELECT DISTINCT recipes.* FROM recipes
INNER JOIN recipes_ingredients ri ON ri.recipe_id = recipes.id
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness != 10

However, that finds recipes that contain any non-sweetness-10 ingredients.

My next attempt was the following, which seems to work:

SELECT * FROM recipes WHERE
(
  SELECT count(*) FROM ingredients INNER JOIN recipes_ingredients ri ON
  ri.ingredient_id = ingredients.id WHERE ingredients.sweetness = 10 AND
  ri.recipe_id = recipes.id
) = 0

However, my general experience is that dependent subqueries run slowly compared to equivalent, well-crafted JOINs. I played around with joining, grouping, etc. but couldn't quite wrap my head around it, especially since, though it seems like LEFT JOIN and IS NULL were the proper tools, having two joins already made things nasty. Great SQL wizards, what query can I run to get the best results? Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

静水深流 2024-10-10 05:38:34

试试这个：

SELECT DISTINCT recipes.* 
FROM recipes r LEFT JOIN
(SELECT ri.recipe_id
FROM recipes_ingredients ri 
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness = 10) i on i.recipe_id=r.recipe_id
WHERE i.recipe_id is null

Try this:

SELECT DISTINCT recipes.* 
FROM recipes r LEFT JOIN
(SELECT ri.recipe_id
FROM recipes_ingredients ri 
INNER JOIN ingredients ON ingredients.id = ri.ingredient_id
WHERE ingredients.sweetness = 10) i on i.recipe_id=r.recipe_id
WHERE i.recipe_id is null

回复收藏 0 原文

雨落星ぅ辰 2024-10-10 05:38:34

尝试：

select
  r.*
from
  recipes r
where
  not exists (
    select
      1
    from
      recipe_ingredients ri
      join ingredients i on ri.ingredient_id = ri.ingredient_id
    where
      ri.recipie_id = r.recipe_id
      and i.sweetness = 10
  )

它仍然是一个相关子查询，但 exists 和 not Exists 有一些优化，应该使它们比原始查询执行得更好。

对于直接连接解决方案，这应该可以工作：

select distinct
  r.*
from
  recipes r
  join recipe_ingredients ri on ri.recipe_id = r.recipe_id
  left join ingredents i on i.ingredient_id = ri.ingredient_id and i.sweetness = 10
where
  i.ingredient_id is null

根据索引，不存在解决方案可能会更快，因为不存在在弄清楚后立即返回如果任何行满足给定条件，而无需查看表中不必要的内容。例如，如果它找到单行甜度为 10，它将停止查看表并返回 false。

Try:

select
  r.*
from
  recipes r
where
  not exists (
    select
      1
    from
      recipe_ingredients ri
      join ingredients i on ri.ingredient_id = ri.ingredient_id
    where
      ri.recipie_id = r.recipe_id
      and i.sweetness = 10
  )

It's still a correlated subquery, but exists and not exists have some optimizations that should make them perform better than your original query.

For a direct join solution, this should work:

select distinct
  r.*
from
  recipes r
  join recipe_ingredients ri on ri.recipe_id = r.recipe_id
  left join ingredents i on i.ingredient_id = ri.ingredient_id and i.sweetness = 10
where
  i.ingredient_id is null

Depending on indexing, the not exists solution could be faster as not exists returns immediately upon figuring out if any rows satisfy the given conditions without looking at any more of the table than necessary. For example, if it finds a single row of sweetness 10, it stops looking at the table and returns false.

回复收藏 0 原文

岁吢 2024-10-10 05:38:34

我玩弄了这里给我的答案（我已经投票了），并且从他们的灵感中，提出了一个查询，该查询似乎以令人惊讶的出色性能完成了这项工作：

SELECT r.* FROM recipes r
LEFT JOIN recipes_ingredients ri ON ri.parent_id = r.id
LEFT JOIN ingredients i ON i.id = ri.ingredient_id AND i.sweetness = 10
GROUP BY r.id HAVING MAX(i.id) IS NULL

与内部条件的连接（受到启发@Donnie）提出配方-成分组合，如果成分的甜度不是 10，则显示 NULL 行。然后我们按配方 ID 进行分组，并选择“最大”成分 ID。（当且仅当没有实际 ID 可供选择时，MAX 函数才会返回 null，即绝对没有与此配方关联的非甜度 10 项可供选择。）如果“ max”成分 ID 为 null，则 MAX 函数没有可供选择的 sweetness-10 项，因此，行 HAVING 为 null MAX(i.id)被选中。

我在禁用查询缓存器的情况下多次运行了 NOT EXISTS 版本的查询和上述版本的查询。对于大约 400 个配方，NOT EXISTS 查询始终需要大约 1.0 秒才能完成，而此查询的运行时间通常约为 0.1 秒。针对大约 5000 个食谱，NOT EXISTS 查询花费了大约 30 秒，而上述查询通常仍花费 0.1 秒，并且几乎总是低于 1.0。

值得注意的是，检查每个表的解释后，这里列出的查询几乎能够完全在我给这些表的索引上运行，这可能解释了为什么它能够毫不费力地进行各种连接和分组。另一方面，NOT EXISTS 查询必须执行依赖子查询。如果这些索引没有到位，两者的性能可能会更加平等，但是当有机会使用原始连接时，查询优化器似乎非常强大。

这个故事的寓意是：格式良好的 JOIN 非常强大:) 谢谢大家！

I played around with the answers given me here (which I've since upvoted), and, from their inspiration, have come up with a query that seems to do the job with surprisingly outstanding performance:

SELECT r.* FROM recipes r
LEFT JOIN recipes_ingredients ri ON ri.parent_id = r.id
LEFT JOIN ingredients i ON i.id = ri.ingredient_id AND i.sweetness = 10
GROUP BY r.id HAVING MAX(i.id) IS NULL

The joins with the condition inside (inspired by @Donnie) bring out recipe-ingredient combinations, with NULL rows if the ingredient is not of sweetness 10. We then group by recipe ID, and select the "max" ingredient ID. (The MAX function will return null if and only if there are no actual IDs to select, i.e., there are absolutely no non-sweetness-10 items associated with this recipe to choose instead.) If that "max" ingredient ID is null, then there were no sweetness-10 items for the MAX function to select, and, therefore, rows HAVING a null MAX(i.id) are selected.

I ran both the NOT EXISTS version of the query and the above version of the query a number of times with the query cacher disabled. Against about 400 recipes, the NOT EXISTS query would consistently take about 1.0 seconds to complete, whereas this query's runtime was usually around 0.1 seconds. Against about 5000 recipes, the NOT EXISTS query took about 30 seconds, whereas the above query usually still took 0.1 seconds, and was almost always under 1.0.

It's worth noting that, checking EXPLAINs on each, the query listed here is able to run almost entirely on the indices I've given these tables, which probably explains why it is able to do all sorts of joining and grouping without batting an eye. The NOT EXISTS query, on the other hand, has to do dependent subqueries. The two might perform more equally if these indices weren't in place, but that query optimizer is pretty darn powerful when given the chance to use raw joins, it would seem.

Moral of the story: well-formed JOINs are super-duper powerful :) Thanks, all!

回复收藏 0 原文

~没有更多了~