如何避免 INNER JOIN 查询中的笛卡尔积?
我有 6 个表,我们称它们为 a、b、c、d、e、f。现在我想在所有表的所有列(ID 列除外)中搜索某个单词,比如说“Joe”。我所做的是,对所有表进行 INNER JOINS,然后使用 LIKE 搜索列。
INNER JOIN
...
ON
INNER JOIN
...
ON.......etc.
WHERE a.firstname
~* 'Joe'
OR a.lastname
~* 'Joe'
OR b.favorite_food
~* 'Joe'
OR c.job
~* 'Joe'.......etc.
结果是正确的,我得到了我正在寻找的所有列。但我也得到某种笛卡尔积,我得到 2 条或更多条线,结果几乎相同。
我怎样才能避免这种情况?我希望每行只包含一次,因为结果应该出现在网络搜索中。
更新
我首先尝试使用以下语句来确定SELECT DISTINCT
是否有效:pastie.org/970959 但它仍然给我一个笛卡尔积。这有什么问题吗?
I have 6 tables, let's call them a,b,c,d,e,f. Now I want to search all the colums (except the ID columns) of all tables for a certain word, let's say 'Joe'. What I did was, I made INNER JOINS over all the tables and then used LIKE to search the columns.
INNER JOIN
...
ON
INNER JOIN
...
ON.......etc.
WHERE a.firstname
~* 'Joe'
OR a.lastname
~* 'Joe'
OR b.favorite_food
~* 'Joe'
OR c.job
~* 'Joe'.......etc.
The results are correct, I get all the colums I was looking for. But I also get some kind of cartesian product, I get 2 or more lines with almost the same results.
How can i avoid this? I want so have each line only once, since the results should appear on a web search.
UPDATE
I first tried to figure out if the SELECT DISTINCT
thing would work by using this statement: pastie.org/970959 But it still gives me a cartesian product. What's wrong with this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
尝试
选择不同
?try
SELECT DISTINCT
?您在什么条件下
加入
此表
?你有外键
之类的东西吗?也许你应该在每张桌子上分别找到这个词?
On what condition do you
JOIN
thistables
? Do you haveforeign keys
or something?Maybe you should find that word on each table separately?
您使用什么类型的服务器? Microsoft SQL Server 有全文索引功能(我认为其他人也有类似的功能),它可以让您以资源密集程度低得多的方式搜索关键字。
还可以考虑使用 UNION 而不是连接表。
What kind of server are you using? Microsoft SQL Server has a full-text index feature (I think others have something like this too) which lets you search for keywords in a much less resource-intensive way.
Also consider using UNION instead of joining the tables.
如果没有看到你的表格,我只能假设这里发生的事情是你在某个地方有一对多的关系。您可能想要在子查询中执行所有操作,选择不同的 ID,然后获取要按 ID 显示的数据。 不过,有
几点需要注意:
这会慢,您可能想改用全文搜索(如果您的 RDBMS 支持的话)。
单独搜索每个表可能比先连接笛卡尔积中的所有内容然后使用 OR 进行过滤更快。
单独搜索每个
Without seeing your tables, I can only really assume what's going on here is you have a one-to-many relationship somewhere. You probably want to do everything in a subquery, select out the distinct IDs, then get the data you want to display by ID. Something like:
A couple of things to note, however:
This is going to be sloooow and you probably want to use full-text search instead (if your RDBMS supports it).
It may be faster to search each table separately rather than to join everything in a Cartesian product first and then filter with ORs.
如果您的表是实体类型表,例如
a
是人,b
是公司,我认为您无法避免笛卡尔积如果您以这种方式搜索结果(单个查询)。您说您要在所有表中搜索某个单词,但您可能希望将结果分成相应的类型。正确的?否则网络搜索就没多大意义。
因此,如果您搜索“Joe”,您希望看到包含名称“Joe”的人员,例如名为“Joe'sgym”的公司。由于您正在搜索不同的实体,因此您应该将搜索拆分为不同的查询。
如果您确实想在一个查询中执行此操作,则必须更改数据库结构以适应。您将需要某种形式的“搜索表”,其中包含实体 ID (PK) 和实体类型,以及您希望用来查找该实体的关键字列表。例如:
类似这样的事情吗?
但是当您的搜索仅返回一种类型的实体(例如人员)并且您希望返回在该关键字上命中的人员(在与该人员的任何相关表中)时,情况会有所不同。然后,您需要选择要显示的所有字段并按它们分组,忽略您正在搜索的字段。包括它们不可避免地会导致笛卡尔积。
顺便说一句,我只是在这里集思广益。希望它有帮助。
If your tables are entity type tables, for example
a
being persons andb
being companies, I don't think you can avoid a cartesian product if you search for the results in this way (single query).You say you want to search all the tables for a certain word, but you probably want to separate the results into the corresponding types. Right? Otherwise a web search would not make much sense.
So if you seach for 'Joe', you want to see persons containing the name 'Joe' and for example the company named 'Joe's gym'. Since you are searching for different entities so you should split the search into different queries.
If you really want to do this in one query, you will have to change your database structure to accommodate. You will need some form of 'search table' containing an entity ID (PK) and entity type, and a list of keywords you want that entity to be found with. For example:
Something like that?
However it's different when your search returns only one type of entity, say a Person, and you want to return the Persons for which you get a hit on that keyword (in any related table to that Person). Then you will need to select all the fields you want to show and group by them, leaving out the fields in which you are searching. Including them inevitably leads to a cartesian product.
I'm just brainstorming here, by the way. It hope it's helpful.