SQL 选择其中一列的值在另一个条件列中相同的行
我有一个如下所示的交叉引用表:
id document_id subject_id
1 8 21
2 5 17
3 5 76
4 7 88
5 9 17
6 9 76
7 2 76
它将文档与主题相匹配。文档可以是多个主题的成员。我想从该表中返回给定文档与给定集中的所有主题匹配的行。例如,给定主题集:
(17,76)
我只想返回与交叉引用表中某处(至少)该集中的所有主题相匹配的文档的行。给定上述集合的所需输出集将是:
id document_id subject_id
2 5 17
3 5 76
5 9 17
6 9 76
请注意,不会返回表的最后一行,因为该文档仅与所需主题之一匹配。
在 SQL 中查询此内容最简单、最有效的方法是什么?
I have a cross reference table that looks like this:
id document_id subject_id
1 8 21
2 5 17
3 5 76
4 7 88
5 9 17
6 9 76
7 2 76
It matches documents to subjects. Documents can be members of more than one subject. I want to return rows from this table where a given document matches all the subjects in a given set. For example, given the set of subjects:
(17,76)
I want to return only rows for documents which match all the subjects in that set (at least) somewhere in the cross reference table. The desired output set given the above set would be:
id document_id subject_id
2 5 17
3 5 76
5 9 17
6 9 76
Notice that the last row of the table is not returned because that document only matches one of the required subjects.
What is the simplest and most efficient way to query for this in SQL?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我假设该表的自然键是 document_id + subject_id,并且该 id 是代理项; IOW、document_id 和 subject_id 是唯一的。因此,我只是假装它不存在,并且自然键有一个唯一的约束。
让我们从显而易见的事情开始。
这会给你带来你想要的一切加上你不想要的东西。所以我们需要做的就是过滤掉其他的东西。 “其他内容”是具有不等于所需主题的计数的计数的行组。
请注意,subject_id 已被删除,因为它不参与分组。更进一步,我将添加一个名为 subject_i_want 的虚构表,其中包含您想要的 N 行主题。
显然,subject_i_want 可以换成另一个子查询、临时表或其他任何东西。但是,一旦有了这个 document_id 列表,您就可以在更大查询的子选择中使用它。
或者无论如何。
I assume that the natrual key of this table is document_id + subject_id, and that id is a surrogate; IOW, document_id and subject_id are unique. As such, I'm just going to pretend it doesn't exist and that a unique constraint is on the natural key.
Let's start with the obvious.
That gets you everything you want plus stuff you don't want. So all we need to do is filter out the other stuff. The "other stuff" is groups of rows having a count that is not equal to the count of the desired subjects.
Note that subject_id is removed because it doesn't participate in grouping. Taking this one step further, i'm going to add an imaginary table called subjects_i_want that contains N rows of subjects you want.
Obviously subjects_i_want could be swapped out for another subquery, temporary table, or whatever. But, once you have this list of document_id, you can use it within a subselect of a bigger query.
Or whatever.
使用 Oracle(或任何允许使用 with 子句的数据库)。这允许仅定义一次 subject_id 值。
Using Oracle (or any database that allows the with clause). This allows definition of the subject_id values exactly once.
这是一个非常有趣的问题。
我假设您想要一个更通用的查询,但这就是在您始终具有相同数量的主题(例如两个)的情况下我会做的:
当然,您可以添加另一个 INNER JOIN 来添加另一个主题ID..但我承认这不是一个很好的通用解决方案。
That's a very interesting question.
I'm assuming you would like a more generalized query, but this is what I would do in the case where you always have the same number of subjects (say two):
Of course, you could add yet another INNER JOIN to add another subject ID.. But I admit it's not a very good general solution.