关系模型与自然返回重复行的查询
人们普遍认为,在关系模型中:
- 每个关系操作都应该产生一个关系。
- 关系作为集合,不能包含重复的行。
想象一个包含以下数据的“用户”关系。
ID FIRST_NAME LAST_NAME
1 Mark Stone
2 Jane Stone
3 Michael Stone
如果有人运行查询 select LAST_NAME from USERS
,典型的数据库将返回:
LAST_NAME
Stone
Stone
Stone
由于这不是一个关系 - 因为它包含重复的行 - 理想的 RDBMS 应该返回什么?
It's commonly understood that in the relational model:
- Every relational operation should yield a relation.
- Relations, being sets, cannot contain duplicate rows.
Imagine a 'USERS' relation that contains the following data.
ID FIRST_NAME LAST_NAME
1 Mark Stone
2 Jane Stone
3 Michael Stone
If someone runs a query select LAST_NAME from USERS
, a typical database will return:
LAST_NAME
Stone
Stone
Stone
Since this is not a relation - because it contains duplicate rows - what should an ideal RDBMS return?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
“但是有些信息丢失了——有 3 个用户使用这个姓氏。”
如果您感兴趣的是具有该名称的用户数量,那么您的示例的查询不是您应该问的问题。
您的示例的查询将提供问题“存在具有该姓氏的用户的所有姓氏是什么?”的答案。
如果您要问的问题是“有多少用户名为 'Stone'”,那么您应该提交的查询是
Select count(...) from users where last_name = 'Stone';
投影总是“丢失”信息:与投影掉的属性相关的信息。我不明白如何将有用的关系运算符的已知属性解释为反对该运算符的论点。
"But some information is lost - that there are 3 users with that last name."
If the count of users with that name is what you are interested in, then the query of your example is not the question you should be asking.
The query of your example will provide the answer to the question "What are all the last names such that there exists a user that has that last name?".
If the question you want to ask is "How many users are there that are named 'Stone'", then the query you should submit is
Select count(...) from users where last_name = 'Stone';
Projection always "loses" information: the information that is tied to the attributes that are projected away. I don't see how a known property of a useful relational operator can be explained as an argument against that operator.
在 RDBMS 中,仅对姓氏列进行关系投影将仅返回一组具有不同姓氏值的元组。不会有重复的元组。
在 SQL 中,除非指定 DISTINCT 关键字,否则确实会得到重复项。这是因为 SQL 不是一种真正的关系语言 - 除其他外,因为 SQL 表和表表达式不是正确的关系。 SQL DBMS 不是 RDBMS。
In a RDBMS a relational projection on the last name column alone would return only a set of tuples with distinct values of last name. There would be no duplicate tuples.
In SQL it is true that you would get duplicates unless you specified the DISTINCT keyword. That's because SQL is not a truly relational language - among other things because SQL tables and table expressions are not proper relations. A SQL DBMS is not a RDBMS.
“理想的 RDBMS 应该返回什么?”
正如大卫所指出的,它应该返回(在您的示例中)一行。
如果 SQL DBMS 将每个 SELECT 都视为请求了 SELECT DISTINCT,那么它只是一种关系型数据库管理系统。 (但是还需要满足一些微小的附加条件。)
之所以如此,是因为该单行的“含义”如下:“存在某个用户,他有一个名字,他有一个 ID ,他的姓是‘斯通’”。
从逻辑上讲,没有必要再次重复该声明。您要求的权威参考是特德·科德本人:“如果某件事是真的,那么说两遍不会让它变得更真实。”。
"what should an ideal RDBMS return?"
As David indicated, it should return (in your example) one single row.
An SQL DBMS is only a relational one if it treats every SELECT as if SELECT DISTINCT were requested. (But there are a few tiny additional conditions to be met too.)
The reason this is so is that the "meaning" of that single row is as follows : "There exists some user such that he has a first_name, he has an ID, and his last_name is 'Stone'".
There is never any logical need to repeat that statement a second time. The authoritative reference that you asked for, is Ted Codd himself : "If something is true, then saying it twice won't make it any truer.".
我不确定返回值是否有问题。有 3 条记录的
LAST_NAME
包含“Stone”。如果查询中包含FIRST_NAME
或ID
,这一点就会很明显,但事实并非如此。通常,DISTINCT
关键字用于处理此问题并确保不会出现重复项。事实上,如果我的数据库开始自动应用
DISTINCT
(听起来您认为也许应该这样做),我会有点恼火。在调试数据库中的一些奇怪的数据问题时,在不期望的情况下看到重复的行通常是所需的中断。I'm not sure I see a problem with the returned values. There are three records that contain "Stone" for
LAST_NAME
. This would have been obvious ifFIRST_NAME
orID
had been included in the query, but it was not. Usually, theDISTINCT
keyword is used to handle this and ensure that there will be no duplicates.In fact, if my database started applying
DISTINCT
automatically (which it sounds like you think maybe it should), I'd be somewhat annoyed. Seeing duplicate rows when you don't expect to is often the needed break when debugging some weird data problem in a database.我认为您的原始查询没有返回重复的行。它从数据库返回 3 行独立的数据,其中仅包含姓氏列。我想说你的问题措辞不正确,因此为什么 RDBMS 以它们的方式运行(我也认为这是正确的方式)。
要翻译您的查询:
从 USERS 中选择 LAST_NAME
成英文,它将是:
“告诉我所有用户的姓氏”
如果我进入高中体育课并询问老师“使用您的班级列表表,告诉我最后一个”你班上所有学生的名字”,如果班上有双胞胎兄弟,我认为他会列出他们的姓氏两次(或者如果他应该的话,他至少会问你这个问题)。他只会记下班上的人员名单,读出他们的姓氏。
如果你想问“班上学生的不同姓氏是什么”,他不会列出重复的名字。然而,这就是“DISTINCT”关键字的存在。
因此,查询将是:
select unique LAST_NAME from USERS
如果您实际上对英语中唯一姓氏的数量感兴趣,则为“班级中学生有多少个不同的姓氏”或使用您的示例:
select count( LAST_NAME) 与 USERS 不同
,而:
select count(LAST_NAME) from USERS 的
英文意思是:
“班上有多少人有姓氏?”
I would argue that your original query did not return duplicate rows. It returned 3 separate rows of data from the database in which you only included the last name column. I would say that your question is not phrased correctly and hence why RDBMS function in the manner they do (which I also argue is the correct manner).
To translate your query:
select LAST_NAME from USERS
into English, it would be:
"tell me the last name of all the users"
If I went into a highschool gym class and asked the teacher "using your class list sheet, tell me the last name of all the students in your class", if there were twin brothers in the class, I would think he would list their last name twice (or he'd at least ask the question to you if he should). He would just go down the list of people in the class and read off their last names.
If you were wanting to ask the question, "what are the different last names of students in the class", he would not list the names duplicated. However that's what the "DISTINCT" key word exists.
So the query would be:
select distinct LAST_NAME from USERS
And if you were actually interested in the number of unique last names in English is "How many different last names are there of the students in the class" or using your example:
select count(distinct LAST_NAME) from USERS
whereas:
select count(LAST_NAME) from USERS
would mean in English:
"How many people in the class have a last name?"