对“theta join”的清晰解释在关系代数中？

发布于 2024-12-06 09:45:05 字数 305 浏览 0 评论 0原文

我正在寻找关系代数中 theta join 概念的清晰、基本的解释，也许还有一个示例（可能使用 SQL）来说明其用法。

如果我理解正确的话，theta 连接是添加了条件的自然连接。因此，虽然自然连接强制同名属性之间相等（并删除重复项？），但 theta 连接执行相同的操作，但添加了在一种情况下。我有这个权利吗？任何简单明了的解释（对于非数学家）将不胜感激。

另外（抱歉只是把它放在最后，但它有点相关），有人可以解释笛卡尔积的重要性或想法吗？我认为我遗漏了一些关于基本概念的东西，因为对我来说，这似乎只是对一个基本事实的重申，即一组 13 X 一组 4 = 52...

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生活了然无味 2024-12-13 09:45:05

暂时将 SQL 放在一边...

关系运算符将一个或多个关系作为参数并产生一个关系。由于关系根据定义不具有重复名称的属性，因此关系操作 θ 连接和自然连接都将“删除重复的属性”。 [按照您的要求，在 SQL 中发布示例来解释关系操作的一个大问题是，SQL 查询的结果不是关系，因为除其他问题外，它可能具有重复的行和/或列。]

关系笛卡尔乘积运算（关系中的结果）与集合笛卡尔积（一组对中的结果）不同。 “笛卡尔”这个词在这里并不是特别有用。事实上，科德称他的原始运算符为“产品”。

真正的关系语言教程 D 缺少一个乘积运算符，并且乘积不是一个原始运算符由教程 D 的合著者 Hugh Darwen** 提出的关系代数。这是因为没有共同属性名称的两个关系的自然连接会产生与相同两个关系的乘积相同的关系，即自然连接更通用，因此更有用。

考虑这些示例（教程 D）：

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2

返回关系的乘积，即 2 的度（即两个属性，X 和 Y）和 6 的基数 (2 x 3 = 6 个元组）。

然而，

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2

返回关系的自然连接，即度为 1（即产生一个属性 Y 的属性的集合并集）和基数 2（即删除重复元组）。

我希望上面的例子能解释为什么你的说法“一组 13 X 一组 4 = 52”并不严格正确。

同样，教程 D 不包含 theta 连接运算符。这本质上是因为其他运算符（例如自然连接和限制）使其既不必要又不是很有用。相比之下，Codd 的原始运算符包括可用于执行 theta 连接的乘积和限制。

SQL 有一个名为 CROSS JOIN 的显式乘积运算符，它强制结果为乘积，即使它需要通过创建重复列（属性）来违反 1NF。考虑与上面后一个教程 D 示例等效的 SQL：

WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)), 
     R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT * 
  FROM R1 CROSS JOIN R2;

这将返回一个表表达式，其中有两列（而不是一个属性），均称为 Y (!!) 和 6 行，即这个

SELECT c1 AS Y, c2 AS Y 
  FROM (VALUES (1, 1), 
               (2, 1), 
               (3, 1), 
               (1, 2), 
               (2, 2), 
               (3, 2)
       ) AS T (c1, c2);

** 即，虽然只有一种关系模型（即 Codd 模型），但可以有多个关系代数（即 Codd 模型只有一个）。

Leaving SQL aside for a moment...

A relational operator takes one or more relations as parameters and results in a relation. Because a relation has no attributes with duplicate names by definition, relational operations theta join and natural join will both "remove the duplicate attributes." [A big problem with posting examples in SQL to explain relation operations, as you requested, is that the result of a SQL query is not a relation because, among other sins, it can have duplicate rows and/or columns.]

The relational Cartesian product operation (results in a relation) differs from set Cartesian product (results in a set of pairs). The word 'Cartesian' isn't particularly helpful here. In fact, Codd called his primitive operator 'product'.

The truly relational language Tutorial D lacks a product operator and product is not a primitive operator in the relational algebra proposed by co-author of Tutorial D, Hugh Darwen**. This is because the natural join of two relations with no attribute names in common results in the same relation as the product of the same two relations i.e. natural join is more general and therefore more useful.

Consider these examples (Tutorial D):

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2

returns the product of the relations i.e. degree of two (i.e. two attributes, X and Y) and cardinality of 6 (2 x 3 = 6 tuples).

However,

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2

returns the natural join of the relations i.e. degree of one (i.e. the set union of the attributes yielding one attribute Y) and cardinality of 2 (i.e. duplicate tuples removed).

I hope the above examples explain why your statement "that a set of 13 X a set of 4 = 52" is not strictly correct.

Similarly, Tutorial D does not include a theta join operator. This is essentially because other operators (e.g. natural join and restriction) make it both unnecessary and not terribly useful. In contrast, Codd's primitive operators included product and restriction which can be used to perform a theta join.

SQL has an explicit product operator named CROSS JOIN which forces the result to be the product even if it entails violating 1NF by creating duplicate columns (attributes). Consider the SQL equivalent to the latter Tutoral D exmaple above:

WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)), 
     R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT * 
  FROM R1 CROSS JOIN R2;

This returns a table expression with two columns (rather than one attribute) both called Y (!!) and 6 rows i.e. this

SELECT c1 AS Y, c2 AS Y 
  FROM (VALUES (1, 1), 
               (2, 1), 
               (3, 1), 
               (1, 2), 
               (2, 2), 
               (3, 2)
       ) AS T (c1, c2);

** That is, although there is only one relational model (i.e. Codd's), there can be more than one relational algebra (i.e. Codd's is but one).

回复收藏 0 原文