星型架构:客户和非客户的单独维度还是服务员的共享维度?
我是星型模式建模新手,刚刚阅读数据仓库工具包 。
我有一个业务流程,其中有客户和非客户与我们的一些员工进行电话会议。
我的事实表(称为“受众”)将包含对参与人员连接到呼叫的时间以及此人连接到呼叫的成本的度量。要点是“与电话会议的个人连接”。
我是否应该使用符合的客户端维度并以这种方式创建非客户端维度(对于还不是客户端的调用者)(省略不属于此问题的维度):
或者以这种方式拥有与符合的客户维度相关的不符合的参与维度是否可以/更好:
或者是否有更好/标准的机制来建模这样的业务流程?
编辑:
使用上面的模型 2,但在客户维度表和参与维度之上创建一个视图,使其看起来只有一个维度怎么样?
这是达米尔下面的答案的可接受的替代方案吗?
I'm new to modeling star schemas, fresh from reading the Data Warehouse Toolkit.
I have a business process that has clients and non-clients calling into conference calls with some of our employees.
My fact table, call it "Audience", will contain a measure of how long an attending person was connected to the call, and the cost of this person's connection to the call. The grain is "individual connection to the conference call".
Should I use my conformed Client dimension and create a non-client dimension (for the callers that are not yet clients) this way (omitting dimensions that are not part of this questions):
Or would it be OK/better to have a non-conformed Attending dimension related to the conformed Client dimension in this manner:
Or is there a better/standard mechanism to model business processes like this one?
Edit:
What about using model 2 above, but creating a view on top of the client dimension table and the attending dimension to make it look like it is only one dimension?
Is that an acceptable alternative to Damir's answer below?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
无需将客户端拆分为两个表(维度)。只需将所有客户、活跃客户和潜在客户放入同一个维度表中即可。
然后,您可以引入 IsActive 属性(列)来区分付费客户和潜在客户。迟早你会使用数据挖掘工具来更多地了解客户,以及愿意为你的服务付费的人和不愿意付费的人的区别。为了使算法发挥作用,您必须提供两组人的数据——付费的人和未付费的人。总而言之,潜在客户与付费客户属于同一表。
这样,您就可以使用您的模型 1。确保事实表中的度量有意义。例如,如果 call_id =123 有 10 人参与,
则应返回通话的总成本,而不是无意义的内容,例如实际成本的 10 倍。
编辑
“付费客户”和“潜在客户”都是客户的一种类型,因此属于同一个维度表——dimClient。在 DW 的某个地方,有一个factSale(或类似的),FK 到dimSale。即使您在dimClient 中没有用于区分付费客户和潜在客户的列,您仍然可以通过加入factSale 和dimClient 来获得付费客户。
“谁是顾客?”在组织中引入 DW 时,这是一个常见的争论。
为了能够分析客户获取、保留、转化等,潜在客户与付费客户享有相同的待遇——至少在 DW 中是这样。请记住,对于(几乎)任何首席执行官来说,获取和创造新客户都是首要任务。
There is no need to split clients into two tables (dimensions). Simply put all clients, active and prospects into the same dimension table.
You could then introduce an IsActive attribute (column) to distinguish between paying clients and prospects. Sooner or later you will use a data mining tool to learn more about clients and what distinguishes people who are willing to pay for your service from those who are not. In order for the algorithm to work, you have to supply data for both groups of people -- those who are paying and those who are not paying. To summarize, prospects belong to the same table as paying clients.
With this, you can use your model No 1. Make sure that measures in the fact table make sense. For example if a call_id =123 had 10 people participating, then
should return the total cost of the call, not something meaningless -- like 10x the real cost.
EDIT
A "paying client" and a "prospect client" are both a type of a client, therefore belong to the same dimension table -- dimClient. Somewhere in the DW there is a factSale (or similar) with FK to the dimSale. Even if you do not have a column in dimClient to differentiate between paying and prospects -- you can still get paying clients by joining factSale and dimClient.
"Who is a customer?" is a common debate when introducing a DW in an organization.
In order to be able to analyze client acquisition, retention, conversion, etc., prospects have the same treatment as paying customers -- at least in the DW. Keep in mind that acquiring and creating new customers is on the top of the list for (almost) any CEO.
我会选择第二个:它在与会者自己的专用维度中建模,同时允许您通过该维度中的属性暴露他们的客户性(或其他方式),这可能是您想要深入了解的方式现实生活(“向我显示所有与会者”,然后是“现在哪些是客户”)。
在您的客户维度中,我将填充所有与会者的 client_id,匹配与会者不是客户的“未知”元素。
这里有一个关于此的很好的讨论:
http://crpit.com/confpapers/CRPITV75Riazati.pdf
I would go for the second: it models the attendees in their own, dedicated dimension, whilst allowing you to expose their client-ness (or otherwise) via an attribute in that dimension, which is probably the way you would want to drill down in real life ("show me all attendees" followed by "and now which of those are clients").
In your client dimension I would populate the client_id for all attendees, matching to an "unknown" element where the attendee is not a client.
There's a nice discussion about this here:
http://crpit.com/confpapers/CRPITV75Riazati.pdf
这没什么区别。第二个版本可能更正确,但是你的olap系统支持这个吗?
It makes little difference. The second version is possibly more correct, but does your olap system support this?
第二个对我来说看起来像“雪花模式”。从维基百科文章开始研究雪花模式。您会看到星星和雪花之间的一些比较。
The second one looks like "snowflake schema" to me. Look into snowflake schema, starting with the wikipedia article. You'll see several comparisons between star and snowflake.