数据仓库 - 如何链接维度
刚刚进入数据仓库,需要您的帮助来澄清一个困惑。 假设我有员工维度和部门维度。 如果我有一份报告要求我列出 dimEmployee 的字段(姓名、薪水、职位)和部门的字段(部门编号、描述、经理),我该怎么做。 我是否创建一个事实表(无事实)来作为这两个维度之间的连接表? 或者我是否需要以不同的方式设计这两个表。 每个人都在谈论事实和维度,但我们是否考虑过链接维度表?
感谢您的见解。
RK
Just got into datawarehousing and need your help to clarify a confusion. Lets say I have Employee dimension and Department Dimension. If I have a report that requires me to list fields from dimEmployee (Name, Salary, Position) and fields from department (DeptNo, Desc, Manager), how do I do that. Do I create a fact table(factless) that will be a joining table between these two dimensions ? Or do I need to design these two tables differently. Everyone talks about facts and dimension, but do we even considering linking dimension tables at all ?
Thanks for your insights.
RK
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
员工和部门之间必须存在关系。 这通常是通过向 Employee 表添加 DepartmentId 列来完成的。
There would have to be a relation between employee and department. This is typically done by adding a column DepartmentId to the Employee table.
谢谢大家的回复。
因此,如果需要查找某个部门在某个薪资范围内的员工总数,并且还需要列出某个部门的员工,设计可能如下所示
dimEmployee
员工 ID
部门
姓名
等等...
dimDept
部门编号
描述
经理
等等......
fctEmpDept
员工 ID
部门编号
薪资
当我阅读有关数据仓库的内容时,我只看到维度和事实表。 如果需要显示用于正常报告目的的列表,我真的很困惑如何构建表格。 显然,可能存在一些与外键链接的维度。
Thank you all for your responses.
So if there is a requirement to find total employees in a dept in a certain salary range and also a requirement to list employees in a dept., the design may look like this
dimEmployee
EmpId
Dept
Name
etc...
dimDept
DeptId
Desc
Manager
etc....
fctEmpDept
empId
DeptId
Salary
When I read about datawarehouse, I see only dimensions and fact tables. I was really confused about how the tables will be structured if there are requirements to show lists for normal reporting purposes. Apparantly, there could be some dimensions that may be linked with foreign keys.
如果保持维度彼此独立,人员维度中没有部门引用,部门维度中没有人员引用,则分配事实表可以充当两者之间的桥梁。
例如,
对人员分配的更改(例如直线经理的更改或晋升)不需要更改维度,而只需要更改事实表,该事实表将提供一个很好的简单的更改历史记录。 当您有多个共享这些维度的事实表时,保持它们简单将会带来好处。 尝试两者的简单实现并尝试一下,我相信您会发现这更自然。
If you keep your dimensions independent of each other no department reference in the person dimension, no person reference in the department dimension, an assignment fact table can act as the bridge between the two.
For example
So a change to a persons assignment such as a change in line manager or a promotion requires no change to the dimensions but only to the fact table which will provide a nice simple historical record of changes. When you have multiple fact tables sharing these dimensions keeping them simple will pay dividends. Try a simple implementation of both and give them a try and I'm sure you'll find this to be more natural.
选项A:如果每个员工只能属于一个部门。 我会将 DepartmentID 添加到事实表中。
选项 B:如果一名员工可以属于多个部门,我会采用第三种选择,设置指标表(或聚合事实表)。 然后,您可以按部门对员工进行聚合,并将聚合后的数字存储在这个新表中。
选项C:您可以将departmentID 添加到employee 表中,这样就可以完成工作,但随后您将设置一个层次结构。 您可以这样做,但随着数据集的增长,您的联接会变得困难且效率降低。
我会选择前两个选项之一,具体取决于您的员工/部门关系。
我承认还有其他可用的解决方案,但这些将是您最好的选择。
Option A:If each employee can belong to one and only one department. I would add the departmentID to the fact table.
Option B: If an employee can belong to multiple departments, I would go with a third alternative, setup metric tables (or aggregated fact tables). You would then do the aggregation of your employee's by department and store the aggregated number in this new table.
Option C: You could add the departmentID to the employee table and that would get the job done, but you would then be setting up a hierarchy. You can do this, but your joins get difficult and less efficient as your dataset grows.
I would go with one of the first 2 options, depending on your employee/department relationship.
I will admit there are other solutions available, but these would be your best bet.
当您谈论维度和事实表时,您通常指的是聚合数据。 您可以写一份报告,讨论每个部门的员工人数。
听起来您正在处理数据列表。 使用直接 SQL 可能会更好地实现这一点。 所以像
When you talk about Dimension and fact tables you usually refer to aggregated data. you may write a report that talks about count of employees per Department.
it sounds like you are dealing with lists of data. This may be better achieved using direct SQL. so something like
任何问题的典型维度包括时间(总是)、地理位置(例如,部门、地区、州等)等。在您的情况下,听起来公司阶梯中的薪资范围、职位等可能是相关的。
事实表通常包含可添加的内容(例如,给定薪资范围内的人数、职位等)。
我不确定 Employee 是否符合维度; 对我来说听起来更像是事实。
Typical dimensions for any problem include Time (always), geography (e.g., Department, Territory, State, etc.), etc. In your case it sounds like salary ranges in a company ladder, positions, etc. might be pertinent.
Fact tables usually contain things that are additive (e.g., number of individuals within a given salary range, position, etc.).
I'm not sure that Employee would qualify as a dimension; sounds more like a fact to me.