theta 连接、等值连接和自然连接之间的区别
当涉及到 theta 连接、等值连接和自然连接时,我无法理解关系代数。有人可以帮助我更好地理解它吗?如果我在 theta 连接上使用 = 符号,它与使用自然连接完全相同吗?
I'm having trouble understanding relational algebra when it comes to theta joins, equijoins and natural joins. Could someone please help me better understand it? If I use the = sign on a theta join is it exactly the same as just using a natural join?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
theta 连接允许任意比较关系(例如≥)。
等连接是使用相等运算符的 theta 连接。
自然连接是每个关系中具有相同名称的属性的等值连接。
此外,自然连接会删除相等比较中涉及的重复列,因此每个比较列中仅保留 1 个;用粗略的关系代数术语来说:
<代码>
⋈ = πR,S-as ○ ⋈aR=aS
A theta join allows for arbitrary comparison relationships (such as ≥).
An equijoin is a theta join using the equality operator.
A natural join is an equijoin on attributes that have the same name in each relationship.
Additionally, a natural join removes the duplicate columns involved in the equality comparison so only 1 of each compared column remains; in rough relational algebraic terms:
⋈ = πR,S-as ○ ⋈aR=aS
虽然解释确切差异的答案很好,但我想展示如何将关系代数转换为 SQL 以及这 3 个概念的实际价值是什么。
您问题中的关键概念是联接的概念。要理解联接,您需要了解笛卡尔积(该示例基于 SQL,其中等效项称为交叉联接,正如 onedaywhen 指出的那样);
这在实践中不是很有用。考虑这个例子。
笛卡尔积产品 x 组件将是 - 波纹管或 sql fiddle。您可以看到有 12 行 = 3 x 4。显然,像“笔记本电脑”和“轮子”这样的行没有任何意义,这就是为什么在实践中很少使用笛卡尔积。
JOIN 的目的是为这些产品增加更多价值。我们真正想要的是“加入”产品及其相关组件,因为每个组件都属于一个产品。执行此操作的方法是使用连接:
关联的 SQL 查询将如下所示(您可以使用此处的所有示例)
并且结果:
请注意,结果只有 4 行,因为笔记本电脑有 3 个组件,汽车有 1 个组件,飞机没有。这更有用。
回到你的问题,你问的所有连接都是我刚刚展示的 JOIN 的变体:
自然连接 = 连接(ON 子句)是在所有具有相同名称的列上进行的;与所有其他连接不同,它从结果中删除重复的列;大多数 DBMS(由不同供应商创建的数据库系统,例如 Microsoft 的 SQL Server、Oracle 的 MySQL 等)甚至懒得支持这一点,这只是不好的做法(或者故意选择不实现它)。想象一下,开发人员将“产品”中第二列的名称从“价格”更改为“成本”。然后,所有自然连接都将在 PName 和 Cost 上完成,由于没有数字匹配,因此导致 0 行。
Theta Join = 这是每个人都使用的通用联接,因为它允许您指定条件(SQL 中的 ON 子句)。您几乎可以按照您喜欢的任何条件加入,例如前 2 个字母相似或价格不同的产品。在实践中,这种情况很少出现 - 在 95% 的情况下,您将在相等条件下进行连接,这导致我们:
等值连接 = 实践中最常用的一种。上面的例子是等值连接。数据库针对这种类型的连接进行了优化!等值连接的反义词是非等值连接,即当您按照“=”以外的条件进行连接时。数据库没有为此优化!它们都是通用 theta 连接的子集。自然连接也是 theta 连接,但条件(theta)是隐式的。
信息来源:大学+经过认证的SQL Server开发人员+最近完成了斯坦福大学的MOO“数据库导论”,所以我敢说我对关系代数有了新的认识。
While the answers explaining the exact differences are fine, I want to show how the relational algebra is transformed to SQL and what the actual value of the 3 concepts is.
The key concept in your question is the idea of a join. To understand a join you need to understand a Cartesian Product (the example is based on SQL where the equivalent is called a cross join as onedaywhen points out);
This isn't very useful in practice. Consider this example.
The Cartesian product Product x Component will be - bellow or sql fiddle. You can see there are 12 rows = 3 x 4. Obviously, rows like "Laptop" with "wheels" have no meaning, this is why in practice the Cartesian product is rarely used.
JOINs are here to add more value to these products. What we really want is to "join" the product with its associated components, because each component belongs to a product. The way to do this is with a join:
The associated SQL query would be like this (you can play with all the examples here)
and the result:
Notice that the result has only 4 rows, because the Laptop has 3 components, the Car has 1 and the Airplane none. This is much more useful.
Getting back to your questions, all the joins you ask about are variations of the JOIN I just showed:
Natural Join = the join (the ON clause) is made on all columns with the same name; it removes duplicate columns from the result, as opposed to all other joins; most DBMS (database systems created by various vendors such as Microsoft's SQL Server, Oracle's MySQL etc. ) don't even bother supporting this, it is just bad practice (or purposely chose not to implement it). Imagine that a developer comes and changes the name of the second column in Product from Price to Cost. Then all the natural joins would be done on PName AND on Cost, resulting in 0 rows since no numbers match.
Theta Join = this is the general join everybody uses because it allows you to specify the condition (the ON clause in SQL). You can join on pretty much any condition you like, for example on Products that have the first 2 letters similar, or that have a different price. In practice, this is rarely the case - in 95% of the cases you will join on an equality condition, which leads us to:
Equi Join = the most common one used in practice. The example above is an equi join. Databases are optimized for this type of joins! The oposite of an equi join is a non-equi join, i.e. when you join on a condition other than "=". Databases are not optimized for this! Both of them are subsets of the general theta join. The natural join is also a theta join but the condition (the theta) is implicit.
Source of information: university + certified SQL Server developer + recently completed the MOO "Introduction to databases" from Stanford so I dare say I have relational algebra fresh in mind.
@outis的答案很好:关于关系简洁且正确。
然而,对于 SQL 来说,情况稍微复杂一些。
考虑通常的供应商和零件数据库,但在 SQL 中实现:
将返回一个结果集**,其中包含
< code>SNO, SNAME, STATUS, CITY, PNO, QTY
对两个表中具有相同名称的列执行联接,
SNO
。请注意,结果集有六列,并且仅包含一列SNO
。现在考虑 theta eqijoin,其中必须显式指定连接的列名称(还需要范围变量
S
和SP
):结果集将有七列,包括
SNO
的两列。结果集的名称是 SQL 标准所称的“依赖于实现”,但可能如下所示:SNO、SNAME、STATUS、CITY、SNO、PNO、QTY
或者可能是这样的
S .SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY
换句话说,SQL中的
NATURAL JOIN
可以考虑从数据库中删除名称重复的列。结果集(但可惜不会删除重复的行 - 您必须记住自己将 SELECT 更改为 SELECT DISTINCT)。** 我不太知道
SELECT * FROM table_expression;
的结果是什么。我知道这不是一种关系,因为除其他原因外,它可以包含具有重复名称的列或没有名称的列。我知道它不是一个集合,因为除其他原因外,列顺序很重要。它甚至不是 SQL 表或 SQL 表表达式。我称之为结果集。@outis's answer is good: concise and correct as regards relations.
However, the situation is slightly more complicated as regards SQL.
Consider the usual suppliers and parts database but implemented in SQL:
would return a resultset** with columns
SNO, SNAME, STATUS, CITY, PNO, QTY
The join is performed on the column with the same name in both tables,
SNO
. Note that the resultset has six columns and only contains one column forSNO
.Now consider a theta eqijoin, where the column names for the join must be explicitly specified (plus range variables
S
andSP
are required):The resultset will have seven columns, including two columns for
SNO
. The names of the resultset are what the SQL Standard refers to as "implementation dependent" but could look like this:SNO, SNAME, STATUS, CITY, SNO, PNO, QTY
or perhaps this
S.SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY
In other words,
NATURAL JOIN
in SQL can be considered to remove columns with duplicated names from the resultset (but alas will not remove duplicate rows - you must remember to changeSELECT
toSELECT DISTINCT
yourself).** I don't quite know what the result of
SELECT * FROM table_expression;
is. I know it is not a relation because, among other reasons, it can have columns with duplicate names or a column with no name. I know it is not a set because, among other reasons, the column order is significant. It's not even a SQL table or SQL table expression. I call it a resultset.Natural 是 Equi 的子集,Equi 是 Theta 的子集。
不一定,但它会是一个 Equi。 Natural 意味着您在所有类似命名的列上进行匹配,Equi 只是意味着您专门使用“=”(而不是“小于”等),
但这只是纯粹的学术界,您可能使用关系数据库多年而从未听说过任何人使用这些术语。
Natural is a subset of Equi which is a subset of Theta.
Not necessarily, but it would be an Equi. Natural means you are matching on all similarly named columns, Equi just means you are using '=' exclusively (and not 'less than', like, etc)
This is pure academia though, you could work with relational databases for years and never hear anyone use these terms.
Theta 加入:
当您使用任何运算符(例如 =、<、>、>= 等)进行联接查询时,该联接查询将出现在 Theta join 下。
等值连接:
当您仅使用相等运算符进行联接查询时,该联接查询属于等值联接。
示例:
注意:等连接也是 theta 连接!
自然加入:
一种等值连接,通过比较两个表中的所有同名列来隐式发生。
注意:这里,对于每对相同命名的列,连接结果只有一列。
Theta Join:
When you make a query for join using any operator,(e.g., =, <, >, >= etc.), then that join query comes under Theta join.
Equi Join:
When you make a query for join using equality operator only, then that join query comes under Equi join.
Example:
Note: Equi join is also a theta join!
Natural Join:
a type of Equi Join which occurs implicitly by comparing all the same names columns in both tables.
Note: here, the join result has only one column for each pair of same named columns.
Example
两个表的笛卡尔积给出了元组的所有可能组合,就像数学中两个集合的叉积的例子一样。因为很多时候,一些垃圾值也占用了内存中不必要的空间,所以这里的连接来拯救,它只给出那些需要且有意义的属性值的组合。
内连接给出了表中的重复字段两次,而自然连接通过仅过滤重复列并仅显示一次来解决问题。否则,两者的工作原理相同。自然连接更有效,因为它保留了内存。此外,自然连接中删除了冗余。
两个表的等连接使得它们仅显示与其他表中的值匹配的那些元组。例如 :
让 new1 和 new2 成为两个表。如果sql查询 select * from new1 join new2 on new1.id = new.id (id是两个表中的同一列),则从new2表开始并连接与第二个表中的id匹配。此外,非等值连接没有相等运算符,它们有 <、> 和 Between 运算符。
theta join 由所有比较运算符组成,包括相等和其他 < ,>比较运算符。当它使用 equal(=) 运算符时,它被称为 equal join 。
Cartesian product of two tables gives all the possible combinations of tuples like the example in mathematics the cross product of two sets . since many a times there are some junk values which occupy unnecessary space in the memory too so here joins comes to rescue which give the combination of only those attribute values which are required and are meaningful.
inner join gives the repeated field in the table twice whereas natural join here solves the problem by just filtering the repeated columns and displaying it only once.else, both works the same. natural join is more efficient since it preserves the memory .Also , redundancies are removed in natural join .
equi join of two tables are such that they display only those tuples which matches the value in other table . for example :
let new1 and new2 be two tables . if sql query select * from new1 join new2 on new1.id = new.id (id is the same column in two tables) then start from new2 table and join which matches the id in second table . besides , non equi join do not have equality operator they have <,>,and between operator .
theta join consists of all the comparison operator including equality and others < , > comparison operator. when it uses equality(=) operator it is known as equi join .
自然连接:当两个关系中至少有一个共同属性时,自然连接是可能的。
Theta Join:当两个人在特定条件下进行操作时,Theta join 是可能的。
等值连接:当两个人按照相等的条件行事时,等值是可能的。它是 θ 连接的一种类型。
Natural Join: Natural join can be possible when there is at least one common attribute in two relations.
Theta Join: Theta join can be possible when two act on particular condition.
Equi Join: Equi can be possible when two act on equity condition. It is one type of theta join.