如何报告稀疏事实表的稀疏区域
源系统通过报告缺勤事件来跟踪学区的学生出勤情况。任何特定日期的出勤情况都可以通过检查三个数据集来确定:学校日历、学生注册和缺勤。
在任何给定的上课日,出勤的注册学生人数通常远大于缺勤的学生人数,因此这种方法可以显着减少用于跟踪出勤情况的存储记录数量。
我正在尝试确定在维度模型中表示每日出勤率的正确方法。最明显的方法是创建一个无事实的表,其中包含每个学生每个上课日的谷物,以及具有出勤和缺勤原因值的出勤维度。这对于 OLAP 来说非常简单,但缺点是事实表的大小。
例如,对于 30,000 名学生和 188 个教学日,意味着每年大约有 50 万条记录(如果这看起来还不够大,不足以成为问题,那么请考虑一个示例,其中必须按时段而不是按每个时段报告出勤情况)天)。将此与仅记录缺勤的事实表进行对比,并且数字要小得多。但是,如果我这样做,那么我不确定如何构建汇总每日出勤事实的多维数据集。
使用的具体 OLAP 技术是 SQL Server Analysis Services 2008 R2。有什么想法吗?
A source system tracks student attendance for a school district by reporting absence events. Attendance on any particular day can be determined by examining three datasets: school calendar, student enrollment, and absence.
On any given school day, the number of enrolled students in attendance is usually much larger than the number that are absent, so this approach reduces the number of records stored to track attendance significantly.
I am trying to determine the proper way to represent daily attendance in a dimensional model. The most obvious way is to create a factless table with a grain per school day per student, and an attendance dimension that has values for both attendance and absence reasons. This is quite straightforward to work with OLAP, but the downside is the size of the fact table.
For example, for 30,000 students and 188 school days means that there are approximately 0.5 million records per year (if this doesn't seem large enough to be an issue, then consider an example in which attendance must be reported on per period rather than per day). Contrast this to a fact table that records only absences and the number is considerably smaller. However, if I do this, then I am not sure how to build cubes that aggregate daily attendance facts.
The specific OLAP technology being used is SQL Server Analysis Services 2008 R2. Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您使用两个事实表:一个用于当前期间(例如上个月),另一个用于历史数据(仅记录缺勤),我想用户(例如老师)不需要有关第二个特定学生的出勤信息5 个月前的当天课程,但他们可能需要上周/上个月的这些信息。
What if you used two fact tables: one for current period (last month for instance) and another for historical data (recording just absence) - I suppose users (teachers for instance) do not need information about attendance of some particular student of the second class of the day 5 months ago, but they might need this information for the last week/month.