在事先不知道模型细节的情况下是否可以构建 OLAP 多维数据集?
请原谅我提出这个模糊的问题 - 我对 OLAP 和 OLAP 并不是很熟悉。 立方体。 让我解释一下我的情况...
我想建立一个数据库来存储调查问卷结果,每个调查问卷可能有几十个问题。 收集了几千份已完成的调查问卷后,我想分析结果,这听起来像是 OLAP 类型的东西(我对此知之甚少)的一个很好的候选者。 我需要能够对“所有养狗的 20-30 岁男性受访者”进行查询 - 即结合“你多大了”、“你养狗吗”等问题的答案。
我也想成为能够存储下个月和下个月的调查结果等,并运行显示本月与上个月等的查询。到目前为止,我认为一切都很好。
这是我的问题的核心:本月我的调查问卷可能有关于性别、年龄和性别的问题。 狗的所有权,下个月的调查问卷可能包括有关(例如)眼睛颜色的问题。 它可能(或可能不会)也会提出一些问题。 这在 OLAP 世界中可行,还是在设计多维数据集时需要提前了解所有“维度”(如果我使用正确的术语)?
另外,如果我正在运行多个不同的调查,其中包含不同但重叠的问题,我可以将它们全部存储在同一个多维数据集中并跨调查运行查询吗? 每项调查可能有几十个问题,其中有几十个问题与其他调查重叠。 OLAP 系统可以满足这种需求吗? 我只是不知道它们有多严格,以及它们实际上是否适合这种用途。
非常感谢任何帮助。
附言。 在有人建议之前,我刚刚购买了 Kimball 的数据仓库工具包,但还没有机会阅读它。 (我怀疑它可能无法直接回答这个问题)。
Pardon me for the woolly question - I'm not really that familiar with OLAP & cubes. Let me explain my situation...
I'd like to build a database to store questionnaire results, where there might be a few dozen questions per questionnaire. Having gathered a few thousand completed questionnaires, I'd like to analyze the results, and that sounds like a good candidate for OLAP type stuff (of which I know very little). I need to be able to run queries on "all male respondents age 20-30 who own a dog" - i.e. combining the answers to "how old are you", "do you own a dog", etc.
I also want to be able to store the results of next month's survey, and the month after that, etc., and run queries showing this month versus last, etc. So far, so good, I assume.
Here's the nub of my question: whereas this month my questionnaire might have questions about sex, age & dog ownership, next month's questionnaire might include a question about (say) eye color. It might (or might not) also drop some questions. Is that do-able in the OLAP world, or do you need to know all the "dimensions" (if I'm using the right term) in advance when you design your cube?
Also, if I'm running several different surveys with different-but-overlapping questions, can I store them all in the same cube and run queries across surveys? Each survey might have a few dozen questions, with a couple of dozen overlapping with other surveys. Do OLAP systems cater for this sort of thing? I just don't know how rigid they are, and whether they are in fact appropriate for this kind of usage.
Any help greatly appreciated.
PS. Before someone suggests it, I did just buy Kimball's Data Warehouse Toolkit but haven't had a chance to read it yet. (I suspect it may not directly answer this question anyway).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
此处有一份白皮书其中有一个部分涵盖了建模调查数据。 这可能就是您正在寻找的东西。
There is a white paper here which has a section covering modelling survey data. This maybe the sort of thing that you are looking for.
首先我要说的是,我也是一个 OLAP 新手,但我认为我已经掌握了您想要实现的目标。
实际上,您的问题是您的维度之一,该问题的答案是事实表的一部分,即事实表具有答案并具有与其相关的年龄、性别、地点(可能)、问题的维度。 可能感觉有点前后颠倒,但这是我对 OLAP 所接受的。
您可能还需要与问题相关的另一个维度,将它们分组到调查问卷中,但这可能只是问题维度本身的一个值,即 Question { QuestionnaireID = 1, QuestionNumber = 4, QuestionText = "Do you own a dogs?" }。
不确定这是否有帮助,但希望能给你一些想法。
I'll start by saying that I'm an OLAP newbie too but I think I have a handle on what you are looking to achieve.
In effect your questions are one of your dimensions, the answer to that question being part of the fact table, i.e. the fact table has the answer and has dimensions associated with it for age, sex, locality (perhaps), questions. It may feel a bit back to front but that's something that I'm coming to terms with for OLAP.
You might also want another dimension related to question that groups them into questionnaires but that might just be a value in the question dimension itself, i.e. Question { QuestionnaireID = 1, QuestionNumber = 4, QuestionText = "Do you own a dog?" }.
Not sure if that helps but hopefully will give you some ideas if nothing else.
这里也是另一个 OLAP 新手...
1) 我只有使用 Mondrian (Pentaho) 创建 OLAP 多维数据集的经验,它确实允许您修改多维数据集的架构(它只是一个 XML 文件)并重建它们(或用 Pentaho 语言) ,发布)。 因此,对于该平台,无论如何,没有提前了解所有维度的要求
2)我同意 Lazurus 关于创建问题维度的建议。 并不要求您的每个“事实”都具有存在于所有维度中的值,因此,如果您要跨维度查找“问题 n”,那么我相信它应该只为您提供“问题 n”的调查问卷数据n”是相关尺寸。
Another OLAP newbie here as well...
1) I only have experience creating OLAP cubes with Mondrian (Pentaho), which does allow you to revise the cube's schema, which is just an XML file, and rebuild them (or in Pentaho-speak, publish). So for that platform, anyway, there's no such requirements for knowing all your dimensions ahead of time
2) I agree with Lazurus' recommendation about creating a dimension of questions. It's not a requirement that each of your "facts" has a value that's present in all dimensions, so if you were to look across the dimension for "Question n", then I believe it should only give you data for the questionnaires where "Question n" is a relevant dimension.