对“theta join”的清晰解释在关系代数中?

发布于 2024-12-06 09:45:05 字数 305 浏览 0 评论 0原文

我正在寻找关系代数中 theta join 概念的清晰、基本的解释,也许还有一个示例(可能使用 SQL)来说明其用法。

如果我理解正确的话,theta 连接是添加了条件的自然连接。因此,虽然自然连接强制同名属性之间相等(并删除重复项?),但 theta 连接执行相同的操作,但添加了在一种情况下。我有这个权利吗?任何简单明了的解释(对于非数学家)将不胜感激。

另外(抱歉只是把它放在最后,但它有点相关),有人可以解释笛卡尔积的重要性或想法吗?我认为我遗漏了一些关于基本概念的东西,因为对我来说,这似乎只是对一个基本事实的重申,即一组 13 X 一组 4 = 52...

I'm looking for a clear, basic explanation of the concept of theta join in relational algebra and perhaps an example (using SQL perhaps) to illustrate its usage.

If I understand it correctly, the theta join is a natural join with a condition added in. So, whereas the natural join enforces equality between attributes of the same name (and removes the duplicate?), the theta join does the same thing but adds in a condition. Do I have this right? Any clear explanation, in simple terms (for a non-mathmetician) would be greatly appreciated.

Also (sorry to just throw this in at the end, but its sort of related), could someone explain the importance or idea of cartesian product? I think I'm missing something with regard to the basic concept, because to me it just seems like a restating of a basic fact, i.e that a set of 13 X a set of 4 = 52...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

生活了然无味 2024-12-13 09:45:05

暂时将 SQL 放在一边...

关系运算符将一个或多个关系作为参数并产生一个关系。由于关系根据定义不具有重复名称的属性,因此关系操作 θ 连接和自然连接都将“删除重复的属性”。 [按照您的要求,在 SQL 中发布示例来解释关系操作的一个大问题是,SQL 查询的结果不是关系,因为除其他问题外,它可能具有重复的行和/或列。]

关系笛卡尔乘积运算(关系中的结果)与集合笛卡尔积(一组对中的结果)不同。 “笛卡尔”这个词在这里并不是特别有用。事实上,科德称他的原始运算符为“产品”。

真正的关系语言教程 D 缺少一个乘积运算符,并且乘积不是一个原始运算符由教程 D 的合著者 Hugh Darwen** 提出的关系代数。这是因为没有共同属性名称的两个关系的自然连接会产生与相同两个关系的乘积相同的关系,即自然连接更通用,因此更有用。

考虑这些示例(教程 D):

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2

返回关系的乘积,即 2 的度(即两个属性,XY)和 6 的基数 (2 x 3 = 6 个元组)。

然而,

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2

返回关系的自然连接,即度为 1(即产生一个属性 Y 的属性的集合并集)和基数 2(即删除重复元组)。

我希望上面的例子能解释为什么你的说法“一组 13 X 一组 4 = 52”并不严格正确。

同样,教程 D 不包含 theta 连接运算符。这本质上是因为其他运算符(例如自然连接和限制)使其既不必要又不是很有用。相比之下,Codd 的原始运算符包括可用于执行 theta 连接的乘积和限制。


SQL 有一个名为 CROSS JOIN 的显式乘积运算符,它强制结果为乘积,即使它需要通过创建重复列(属性)来违反 1NF。考虑与上面后一个教程 D 示例等效的 SQL:

WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)), 
     R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT * 
  FROM R1 CROSS JOIN R2;

这将返回一个表表达式,其中有两列(而不是一个属性),均称为 Y (!!) 和 6 行,即这个

SELECT c1 AS Y, c2 AS Y 
  FROM (VALUES (1, 1), 
               (2, 1), 
               (3, 1), 
               (1, 2), 
               (2, 2), 
               (3, 2)
       ) AS T (c1, c2);

** 即,虽然只有一种关系模型(即 Codd 模型),但可以有多个关系代数(即 Codd 模型只有一个)。

Leaving SQL aside for a moment...

A relational operator takes one or more relations as parameters and results in a relation. Because a relation has no attributes with duplicate names by definition, relational operations theta join and natural join will both "remove the duplicate attributes." [A big problem with posting examples in SQL to explain relation operations, as you requested, is that the result of a SQL query is not a relation because, among other sins, it can have duplicate rows and/or columns.]

The relational Cartesian product operation (results in a relation) differs from set Cartesian product (results in a set of pairs). The word 'Cartesian' isn't particularly helpful here. In fact, Codd called his primitive operator 'product'.

The truly relational language Tutorial D lacks a product operator and product is not a primitive operator in the relational algebra proposed by co-author of Tutorial D, Hugh Darwen**. This is because the natural join of two relations with no attribute names in common results in the same relation as the product of the same two relations i.e. natural join is more general and therefore more useful.

Consider these examples (Tutorial D):

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { X 1 } , TUPLE { X 2 } } AS R2 :
R1 JOIN R2

returns the product of the relations i.e. degree of two (i.e. two attributes, X and Y) and cardinality of 6 (2 x 3 = 6 tuples).

However,

WITH RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } , TUPLE { Y 3 } } AS R1 ,
     RELATION { TUPLE { Y 1 } , TUPLE { Y 2 } } AS R2 :
R1 JOIN R2

returns the natural join of the relations i.e. degree of one (i.e. the set union of the attributes yielding one attribute Y) and cardinality of 2 (i.e. duplicate tuples removed).

I hope the above examples explain why your statement "that a set of 13 X a set of 4 = 52" is not strictly correct.

Similarly, Tutorial D does not include a theta join operator. This is essentially because other operators (e.g. natural join and restriction) make it both unnecessary and not terribly useful. In contrast, Codd's primitive operators included product and restriction which can be used to perform a theta join.


SQL has an explicit product operator named CROSS JOIN which forces the result to be the product even if it entails violating 1NF by creating duplicate columns (attributes). Consider the SQL equivalent to the latter Tutoral D exmaple above:

WITH R1 AS (SELECT * FROM (VALUES (1), (2), (3)) AS T (Y)), 
     R2 AS (SELECT * FROM (VALUES (1), (2)) AS T (Y))
SELECT * 
  FROM R1 CROSS JOIN R2;

This returns a table expression with two columns (rather than one attribute) both called Y (!!) and 6 rows i.e. this

SELECT c1 AS Y, c2 AS Y 
  FROM (VALUES (1, 1), 
               (2, 1), 
               (3, 1), 
               (1, 2), 
               (2, 2), 
               (3, 2)
       ) AS T (c1, c2);

** That is, although there is only one relational model (i.e. Codd's), there can be more than one relational algebra (i.e. Codd's is but one).

浮萍、无处依 2024-12-13 09:45:05

你不太正确 - theta 连接是一种连接,可能包含除 = 以外的条件 - 在 SQL 中,通常为 <>= 等。请参阅 TechNet

作为对于笛卡尔积(或CROSS JOIN),它是一种操作而不是一种想法或概念。这很重要,因为有时您需要使用它! 13 组 x 4 组 = 52 是一个基本事实,笛卡尔积就是基于这个事实。

You're not quite right - a theta join is a join which may include a condition other than = - in SQL, typically < or >= etc. See TechNet

As for cartesian product (or CROSS JOIN), it is an operation rather than an idea or concept. It's important because sometimes you need to use it! It is a basic fact that set of 13 x set of 4 = 52, and cartesian product is based on this fact.

素染倾城色 2024-12-13 09:45:05

如果你了解等值连接,你就应该了解 theta 连接。如果将等连接中的符号=(等于)更改为>=,那么您就已经完成了theta连接。

然而,与等值连接相比,很难看出 theta 连接的实用性,因为我们通常使用的连接子句是 V.primarykey = C.foreignkey。如果要改成theta join,那么可能就要看这个值了;因此你正在进行选择。

自然连接类似于等值连接;不同之处在于它去掉了多余的属性。

If you understand equijoin, you should understand theta join. If you change the symbol = (equals) in equijoin to >=, then you have already done theta join.

However, it is difficult to see the practicality of theta join compared to equijoin since the join clause that we normally use is V.primarykey = C.foreignkey. If you want to change to theta join, then it may depend on the value; as such you are doing selection.

Natural join is similar to equijoin; the difference is that it gets rid of the redundant attributes.

内部联接可以被认为是从叉积开始,然后清除某些行。自然连接会清除被连接的两个表中同名列具有不同值的所有行。等值连接会清除指定列具有不同值的所有行。并且 theta-join 会清除指定列不属于指定关系(<、> 或其他关系;原则上它可以是 is_prefix_of 作为字符串之间的关系)的所有行。

外连接不能这样理解,因为它们无中生有地合成信息(即空值)。

Inner joins can be thought of as beginning with a cross product and then weeding out certain rows. A natural join weeds out all rows in which columns of the same name in the two tables being joine have different values. An equijoin weeds out all rows in which the specified columns have different values. And a theta-join weeds out all rows in which the specified columns do not stand in the specified relationship (<, >, or whatever; in principle it could be is_prefix_of as a relationship between strings).

Outer joins cannot be understood this way, because they synthesize information (that is, nulls) out of nothing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文