SQL ERD新手问题
如果我有一个数据仓库 erd - 具有关系等,
我如何知道要使用什么类型的联接,内部,外部左,外部右,完整外部,
当然,如果数据库已正确创建,它们都将是内部联接?酒吧数据质量问题
If I have a datawarehouse erd - which has the relationsips etc
How will i know what type of join to use , inner, outer left, outer right, full outer,,
surely if the database has been created correctly they all would be inner joins ? bar data quality issues
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
每种类型的联接都会执行不同的操作,因此这完全取决于单个查询的目标。
您不想使用 INNER JOIN 的一个示例是,如果您有一个包含所有应用程序用户列表的用户表和一个记录每次有人登录的登录表。如果您想查找所有未登录的用户,一种方法是使用 OUTER JOIN。这并不意味着您的数据库不正确或存在数据质量问题。
Each type of join does something different, so it all depends on the aim of the individual query.
An example of when you wouldn't want to use an INNER JOIN is if you have a User table containing a list of all your application's users and a Logon table that logs each time someone logs on. If you want to find all users that have not logged on, one way would be to use an OUTER JOIN. This does not mean you have an incorrect database or data quality issues at all.
不,它们不会都是 INNER JOIN。
这仅取决于您想要通过查询完成的任务。每种 JOIN 都有其用途。我建议您阅读数据库供应商的手册以熟悉它们。
No, they won't be all INNER JOINs.
This is only edependent on what you want to accomplish with your query. Every kind of JOIN has its use. I suggest you to read the manual of your database vendor to become familiar with them.
您可能仍然需要在事实之间使用外连接。
不过,您绝对应该能够内部连接到所有维度。不幸的是,我们无法在工作中的数据仓库中做到这一点,因为我们在大多数维度表中都缺少表示空记录的记录。
编辑
事实是数字度量(例如金融交易中的美元),而维度是数据描述符(例如参与金融交易的客户、客户的出生日期)。您最好参考一本好的数据仓库书籍。我推荐 Ralph Kimball 的《数据仓库工具包》。
事实表包含事实和维度的外键。维度表包含维度 ID、文本描述(例如客户名称)以及可能的其他维度的外键(例如出生日期字段的日期维度 ID)。
有时,维度属性在逻辑上可能为空。这不一定存储为空,在我的工作场所,我们使用 0 来表示空维度。目前想到的是我们的客户维度表中的死亡验证方法(例如查看死亡证明)字段。当然,我们的大多数客户都没有死亡,所以我们将这个字段填充为0。但是我们的死亡验证方法维度表只存储实际的死亡验证方法,因此它没有0(不适用)的记录,并且因此我们被迫使用左连接。
You will probably still need to use outer joins between facts.
You should definitely be able to inner join to all dimensions though. Unfortunately, we can't do that with the data warehouse at work because we lack a record in most dimension tables to represent null records.
EDIT
Facts are numeric measures (eg. dollars in a financial transaction) while dimensions are descriptors of data (eg. client involved in financial transaction, date of birth of client). You're best off referencing a good data warehousing book. I recommend The Data Warehouse Toolkit by Ralph Kimball.
A fact table contains both facts and foreign keys to dimensions. A dimension table contains the dimension ID, textual descriptions (eg. client name) and possibly foreign keys to other dimensions (eg. date dimension ID for the date of birth field).
Sometimes, a dimension attribute may be logically null. This is not necessarily stored as null, and at my workplace, we use 0 to represent null dimensions. The one that comes to mind at the moment is a death verification method (eg. viewed death certificate) field that we have in our client dimension table. Of course, most of our clients aren't dead, so we populate this field with 0. Our death verification method dimension table, however, only stores actual death verification methods, so it does not have a record with 0 (not applicable) and we are therefore forced to use left joins to it.