当前位置：文江博客话题详情

theta 连接、等值连接和自然连接之间的区别

发布于 2024-12-11 15:36:09 字数 93 浏览 6 评论 0原文

当涉及到 theta 连接、等值连接和自然连接时，我无法理解关系代数。有人可以帮助我更好地理解它吗？如果我在 theta 连接上使用 = 符号，它与使用自然连接完全相同吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

删除→记忆 2024-12-18 15:36:09

theta 连接允许任意比较关系（例如≥）。

等连接是使用相等运算符的 theta 连接。

自然连接是每个关系中具有相同名称的属性的等值连接。

此外，自然连接会删除相等比较中涉及的重复列，因此每个比较列中仅保留 1 个；用粗略的关系代数术语来说：
<代码>
⋈ = π_{R,S-a_s} ○ ⋈_{a_R=a_S}

回复收藏 0 原文

微凉徒眸意 2024-12-18 15:36:09

虽然解释确切差异的答案很好，但我想展示如何将关系代数转换为 SQL 以及这 3 个概念的实际价值是什么。

您问题中的关键概念是联接的概念。要理解联接，您需要了解笛卡尔积（该示例基于 SQL，其中等效项称为交叉联接，正如 onedaywhen 指出的那样）；

这在实践中不是很有用。考虑这个例子。

Product(PName, Price)
====================
Laptop,   1500
Car,      20000
Airplane, 3000000


Component(PName, CName, Cost)
=============================
Laptop, CPU,    500
Laptop, hdd,    300
Laptop, case,   700
Car,    wheels, 1000

笛卡尔积产品 x 组件将是 - 波纹管或 sql fiddle。您可以看到有 12 行 = 3 x 4。显然，像“笔记本电脑”和“轮子”这样的行没有任何意义，这就是为什么在实践中很少使用笛卡尔积。

|    PNAME |   PRICE |  CNAME | COST |
--------------------------------------
|   Laptop |    1500 |    CPU |  500 |
|   Laptop |    1500 |    hdd |  300 |
|   Laptop |    1500 |   case |  700 |
|   Laptop |    1500 | wheels | 1000 |
|      Car |   20000 |    CPU |  500 |
|      Car |   20000 |    hdd |  300 |
|      Car |   20000 |   case |  700 |
|      Car |   20000 | wheels | 1000 |
| Airplane | 3000000 |    CPU |  500 |
| Airplane | 3000000 |    hdd |  300 |
| Airplane | 3000000 |   case |  700 |
| Airplane | 3000000 | wheels | 1000 |

JOIN 的目的是为这些产品增加更多价值。我们真正想要的是“加入”产品及其相关组件，因为每个组件都属于一个产品。执行此操作的方法是使用连接：

产品 JOIN 组件 ON Pname

关联的 SQL 查询将如下所示（您可以使用此处的所有示例）

SELECT *
FROM Product
JOIN Component
  ON Product.Pname = Component.Pname

并且结果：

|  PNAME | PRICE |  CNAME | COST |
----------------------------------
| Laptop |  1500 |    CPU |  500 |
| Laptop |  1500 |    hdd |  300 |
| Laptop |  1500 |   case |  700 |
|    Car | 20000 | wheels | 1000 |

请注意，结果只有 4 行，因为笔记本电脑有 3 个组件，汽车有 1 个组件，飞机没有。这更有用。

回到你的问题，你问的所有连接都是我刚刚展示的 JOIN 的变体：

自然连接 = 连接（ON 子句）是在所有具有相同名称的列上进行的；与所有其他连接不同，它从结果中删除重复的列；大多数 DBMS（由不同供应商创建的数据库系统，例如 Microsoft 的 SQL Server、Oracle 的 MySQL 等）甚至懒得支持这一点，这只是不好的做法（或者故意选择不实现它）。想象一下，开发人员将“产品”中第二列的名称从“价格”更改为“成本”。然后，所有自然连接都将在 PName 和 Cost 上完成，由于没有数字匹配，因此导致 0 行。

Theta Join = 这是每个人都使用的通用联接，因为它允许您指定条件（SQL 中的 ON 子句）。您几乎可以按照您喜欢的任何条件加入，例如前 2 个字母相似或价格不同的产品。在实践中，这种情况很少出现 - 在 95% 的情况下，您将在相等条件下进行连接，这导致我们：

等值连接 = 实践中最常用的一种。上面的例子是等值连接。数据库针对这种类型的连接进行了优化！等值连接的反义词是非等值连接，即当您按照“=”以外的条件进行连接时。数据库没有为此优化！它们都是通用 theta 连接的子集。自然连接也是 theta 连接，但条件（theta）是隐式的。

信息来源：大学+经过认证的SQL Server开发人员+最近完成了斯坦福大学的MOO“数据库导论”，所以我敢说我对关系代数有了新的认识。

While the answers explaining the exact differences are fine, I want to show how the relational algebra is transformed to SQL and what the actual value of the 3 concepts is.

The key concept in your question is the idea of a join. To understand a join you need to understand a Cartesian Product (the example is based on SQL where the equivalent is called a cross join as onedaywhen points out);

This isn't very useful in practice. Consider this example.

Product(PName, Price)
====================
Laptop,   1500
Car,      20000
Airplane, 3000000


Component(PName, CName, Cost)
=============================
Laptop, CPU,    500
Laptop, hdd,    300
Laptop, case,   700
Car,    wheels, 1000

The Cartesian product Product x Component will be - bellow or sql fiddle. You can see there are 12 rows = 3 x 4. Obviously, rows like "Laptop" with "wheels" have no meaning, this is why in practice the Cartesian product is rarely used.

|    PNAME |   PRICE |  CNAME | COST |
--------------------------------------
|   Laptop |    1500 |    CPU |  500 |
|   Laptop |    1500 |    hdd |  300 |
|   Laptop |    1500 |   case |  700 |
|   Laptop |    1500 | wheels | 1000 |
|      Car |   20000 |    CPU |  500 |
|      Car |   20000 |    hdd |  300 |
|      Car |   20000 |   case |  700 |
|      Car |   20000 | wheels | 1000 |
| Airplane | 3000000 |    CPU |  500 |
| Airplane | 3000000 |    hdd |  300 |
| Airplane | 3000000 |   case |  700 |
| Airplane | 3000000 | wheels | 1000 |

JOINs are here to add more value to these products. What we really want is to "join" the product with its associated components, because each component belongs to a product. The way to do this is with a join:

Product JOIN Component ON Pname

The associated SQL query would be like this (you can play with all the examples here)

SELECT *
FROM Product
JOIN Component
  ON Product.Pname = Component.Pname

and the result:

|  PNAME | PRICE |  CNAME | COST |
----------------------------------
| Laptop |  1500 |    CPU |  500 |
| Laptop |  1500 |    hdd |  300 |
| Laptop |  1500 |   case |  700 |
|    Car | 20000 | wheels | 1000 |

Notice that the result has only 4 rows, because the Laptop has 3 components, the Car has 1 and the Airplane none. This is much more useful.

Getting back to your questions, all the joins you ask about are variations of the JOIN I just showed:

Natural Join = the join (the ON clause) is made on all columns with the same name; it removes duplicate columns from the result, as opposed to all other joins; most DBMS (database systems created by various vendors such as Microsoft's SQL Server, Oracle's MySQL etc. ) don't even bother supporting this, it is just bad practice (or purposely chose not to implement it). Imagine that a developer comes and changes the name of the second column in Product from Price to Cost. Then all the natural joins would be done on PName AND on Cost, resulting in 0 rows since no numbers match.

Theta Join = this is the general join everybody uses because it allows you to specify the condition (the ON clause in SQL). You can join on pretty much any condition you like, for example on Products that have the first 2 letters similar, or that have a different price. In practice, this is rarely the case - in 95% of the cases you will join on an equality condition, which leads us to:

Equi Join = the most common one used in practice. The example above is an equi join. Databases are optimized for this type of joins! The oposite of an equi join is a non-equi join, i.e. when you join on a condition other than "=". Databases are not optimized for this! Both of them are subsets of the general theta join. The natural join is also a theta join but the condition (the theta) is implicit.

Source of information: university + certified SQL Server developer + recently completed the MOO "Introduction to databases" from Stanford so I dare say I have relational algebra fresh in mind.

回复收藏 0 原文

英雄似剑 2024-12-18 15:36:09

@outis的答案很好：关于关系简洁且正确。

然而，对于 SQL 来说，情况稍微复杂一些。

考虑通常的供应商和零件数据库，但在 SQL 中实现：

SELECT * FROM S NATURAL JOIN SP;

将返回一个结果集**，其中包含

< code>SNO, SNAME, STATUS, CITY, PNO, QTY

对两个表中具有相同名称的列执行联接，SNO。请注意，结果集有六列，并且仅包含一列 SNO。

现在考虑 theta eqijoin，其中必须显式指定连接的列名称（还需要范围变量 S 和 SP）：

SELECT * FROM S JOIN SP ON S.SNO = SP.SNO;

结果集将有七列，包括SNO 的两列。结果集的名称是 SQL 标准所称的“依赖于实现”，但可能如下所示：

SNO、SNAME、STATUS、CITY、SNO、PNO、QTY

或者可能是这样的

S .SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY

换句话说，SQL中的NATURAL JOIN可以考虑从数据库中删除名称重复的列。结果集（但可惜不会删除重复的行 - 您必须记住自己将 SELECT 更改为 SELECT DISTINCT）。

** 我不太知道 SELECT * FROM table_expression; 的结果是什么。我知道这不是一种关系，因为除其他原因外，它可以包含具有重复名称的列或没有名称的列。我知道它不是一个集合，因为除其他原因外，列顺序很重要。它甚至不是 SQL 表或 SQL 表表达式。我称之为结果集。

@outis's answer is good: concise and correct as regards relations.

However, the situation is slightly more complicated as regards SQL.

Consider the usual suppliers and parts database but implemented in SQL:

SELECT * FROM S NATURAL JOIN SP;

would return a resultset** with columns

SNO, SNAME, STATUS, CITY, PNO, QTY

The join is performed on the column with the same name in both tables, SNO. Note that the resultset has six columns and only contains one column for SNO.

Now consider a theta eqijoin, where the column names for the join must be explicitly specified (plus range variables S and SP are required):

SELECT * FROM S JOIN SP ON S.SNO = SP.SNO;

The resultset will have seven columns, including two columns for SNO. The names of the resultset are what the SQL Standard refers to as "implementation dependent" but could look like this:

SNO, SNAME, STATUS, CITY, SNO, PNO, QTY

or perhaps this

S.SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY

In other words, NATURAL JOIN in SQL can be considered to remove columns with duplicated names from the resultset (but alas will not remove duplicate rows - you must remember to change SELECT to SELECT DISTINCT yourself).

** I don't quite know what the result of SELECT * FROM table_expression; is. I know it is not a relation because, among other reasons, it can have columns with duplicate names or a column with no name. I know it is not a set because, among other reasons, the column order is significant. It's not even a SQL table or SQL table expression. I call it a resultset.

回复收藏 0 原文

中二柚 2024-12-18 15:36:09

Natural 是 Equi 的子集，Equi 是 Theta 的子集。

如果我在 theta 连接上使用 = 符号，它与刚才的完全相同吗？
使用自然连接？？？

不一定，但它会是一个 Equi。 Natural 意味着您在所有类似命名的列上进行匹配，Equi 只是意味着您专门使用“=”（而不是“小于”等），

但这只是纯粹的学术界，您可能使用关系数据库多年而从未听说过任何人使用这些术语。

回复收藏 0 原文

夏花。依旧 2024-12-18 15:36:09

Theta 加入：
当您使用任何运算符（例如 =、<、>、>= 等）进行联接查询时，该联接查询将出现在 Theta join 下。

等值连接：
当您仅使用相等运算符进行联接查询时，该联接查询属于等值联接。

示例：

> SELECT * FROM Emp JOIN Dept ON Emp.DeptID = Dept.DeptID;
> SELECT * FROM Emp INNER JOIN Dept USING(DeptID)

This will show:
 _________________________________________________
| Emp.Name | Emp.DeptID | Dept.Name | Dept.DeptID |
|          |            |           |             |

注意：等连接也是 theta 连接！

自然加入：
一种等值连接，通过比较两个表中的所有同名列来隐式发生。

注意：这里，对于每对相同命名的列，连接结果只有一列。

 SELECT * FROM Emp NATURAL JOIN Dept

This will show:
 _______________________________
| DeptID | Emp.Name | Dept.Name |
|        |          |           |

Theta Join:
When you make a query for join using any operator,(e.g., =, <, >, >= etc.), then that join query comes under Theta join.

Equi Join:
When you make a query for join using equality operator only, then that join query comes under Equi join.

Example:

> SELECT * FROM Emp JOIN Dept ON Emp.DeptID = Dept.DeptID;
> SELECT * FROM Emp INNER JOIN Dept USING(DeptID)

This will show:
 _________________________________________________
| Emp.Name | Emp.DeptID | Dept.Name | Dept.DeptID |
|          |            |           |             |

Note: Equi join is also a theta join!

Natural Join:
a type of Equi Join which occurs implicitly by comparing all the same names columns in both tables.

Note: here, the join result has only one column for each pair of same named columns.

Example

 SELECT * FROM Emp NATURAL JOIN Dept

This will show:
 _______________________________
| DeptID | Emp.Name | Dept.Name |
|        |          |           |

回复收藏 0 原文

素食主义者 2024-12-18 15:36:09

两个表的笛卡尔积给出了元组的所有可能组合，就像数学中两个集合的叉积的例子一样。因为很多时候，一些垃圾值也占用了内存中不必要的空间，所以这里的连接来拯救，它只给出那些需要且有意义的属性值的组合。

内连接给出了表中的重复字段两次，而自然连接通过仅过滤重复列并仅显示一次来解决问题。否则，两者的工作原理相同。自然连接更有效，因为它保留了内存。此外，自然连接中删除了冗余。

两个表的等连接使得它们仅显示与其他表中的值匹配的那些元组。例如：
让 new1 和 new2 成为两个表。如果sql查询 select * from new1 join new2 on new1.id = new.id （id是两个表中的同一列），则从new2表开始并连接与第二个表中的id匹配。此外，非等值连接没有相等运算符，它们有 <、> 和 Between 运算符。

theta join 由所有比较运算符组成，包括相等和其他 < ，>比较运算符。当它使用 equal(=) 运算符时，它被称为 equal join 。

回复收藏 0 原文