Bruce Ullrey 所著的《实施数据仓库:有效的方法》一书记录了一个人构建数据仓库的旅程。它没有经过高度抛光,这使其更加真实。它读起来就像一本日记,里面有很多模型和其他视觉效果,很好地说明了他的努力。
Larissa Moss 的“商业智能路线图”。标准票价。引导您完成构建高级别 BI 实践的过程。
Steve Williams 的《商业智能的利润影响》提供了许多案例研究,显示了构建数据仓库的价值。
I'll see if I can do my best to answer your questions succinctly.
1.At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?
a. If you find that reporting and monitoring are impairing the performance of your production system and/or an offline data store.
b. If you find that getting answers to your business questions requires building a lot of complex SQL each time.
c. If you find that every time you make a change to your transactional schema, you have to go back and rework all of your reporting queries.
d. If you want to bring together data from multiple sources.
2.What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?
3.Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?
I'll answer these together. I wouldn't think of a data warehouse as an all or nothing venture. It's simply a concise phrase that means "storing your data in a way that allows you to more easily and quickly answer business questions."
Transactional databases are designed to efficiently interface with applications. Data warehouses, data marts, operational data stores and reporting tables are built to efficiently interface with people, if that makes sense.
4.When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?
Good question. If your transactional system provides you with sufficient insight into your business, you probably do not have a need for warehousing.
If you only have one source of data and performance is not a problem, you can probably gain insight from creation of simple reporting tables.
5.Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.
That's a big question that would take far more space than I'm allotted here.
On this one, I can point you to a few places that might provide the insight you seek.
"Implementing A Data Warehouse: A Methodology that worked" by Bruce Ullrey is a book documenting one man's journey to building a data warehouse. It's not highly polished, which gives it more realism. It reads like a journal with lots of models and other visuals that illustrate his efforts pretty well.
"Business Intelligence Roadmap" by Larissa Moss. Standard fare. Walks you through the process of building a BI practice at a high level.
"The Profit Impact of Business Intelligence" by Steve Williams gives a number of case studies that show the value of building data warehouses.
The main purpose of a DW is to speed-up (simplify) reporting and analytic. It enables slicing and dicing of data in any way a business user can think of.
For a first step DW, you can simply implement a Kimball star schema and run SQL queries against it. If this proves to be still too slow, start thinking about pre-calculated aggregations (cubes).
The slicing and dicing of information against a DW is way simpler, than against a normalized DB. Replicated report server will improve performance, but will not simplify slicing and dicing. Also keep in mind that the DW belongs to business users, so it is up to them to come up with various slice/dice ideas at any time -- IT people should simply provide environment in which something like this is possible.
If you just run few reports from time-to-time on your operational system and are satisfied with performance, there is no need for DW.
All my experience is with systems where business users endlessly complain about slow reports and inability to write "complicated queries", while production people complain that the database gets bogged down due to reporting. In all cases a simple Kimball star and a report server with cache and snapshots were good enough.
You should consider building a data warehouse, when two of the following criteria match:
Huge amount of data
Many big complex selects (possibly compared to few inserts, updates, and deletes) that just take too long to execute (and are complicated to write)
Data from different systems needs to get combined
It's really the question what you consider a data warehouse. In many cases you can move gradually from OLTPs Systems with some reports to a full blown data warehouse, as long as you can stick to a relational database management system. First could be to build a first fact table, and keep using the normalized tables for dimension. Then adding more facts, more fact tables or dedicated dimension tables to the game. First in the same database (or one of the databases of the involved systems), possibly moving to a separate database later.
A full data warehouse (separate database, star schema) offers the best options for tuning select statements, apart from going to a specialized system. It is also cleanly decoupled from the OLTP system(s). Think schema design, but also resources like CPU, I/O and memory and organizational, like scheduling of new releases. Of course it is a lot of work which you possibly don't need.
It's in the answers above: just because you have a handfull of complex queries, doesn't mean you should build a DWH, same holds for the other criteria, if they come in isolation.
Can't offer much here, but the advice: go agile. The requirements for a DWH depend extremly on the possibilities the users see. There for requirements are likely to change. Automating tests with databases is a pain, but fooling around in a production system with no proper tests is worse.
At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?
I'd recommend a data warehouse when you observed that performing reporting and analysis activities on the in the transactional data store was harmful to both.
What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?
I have nothing to offer here. I'd say that keeping the transactional and reporting databases seems sensible to me, regardless of whether you call it a warehouse or not. Data mining can be a very CPU intensive activity.
Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?
I have nothing to offer here.
When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?
I'd say that if you don't need to keep long history, aren't doing intensive analysis of the data, and your reporting needs are limited to an ad hoc query from time to time, then perhaps a data warehouse isn't necessary.
Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.
My employers have all used data warehouses for many years prior to my arrival, so I can't speak to what things were like before I arrived.
From my experience, the first sign for starting to think about data warehousing is when you have (or are developing) a transactional database and the users start adding lots of reporting and data history requirements. Which is pretty much always. It's always easier to have a separate data warehouse or reporting database than to try to design a transactional system that handles the reporting needs that end users always have. Storing history (for business entities) in a transactional system adds complexity and bloats a database that should be as responsive as possible.
On the flip side, I've been in large companies where many groups created data warehouses because data of interest was spread across many systems and was therefore difficult to query. The problem was that each group created their own data warehouse because all the existing warehouses in the company did not have the right subset of information, or had a data model that was regarded as non-optimal or incorrect. This made the situation worse by creating even more disparate data systems that were hard to compare.
DW could be considered if, one is using a ‘Transactional System’ from a long period. Later, they realize that they need to perform some data mining, to determine different data patterns of the business. And finally, with the help of the determined data patterns, one wants to help the top management to take further decisions in the benefit of the company.
Following steps needs to be taken up for building up a data ware house:
An ETL platform and database needs to be decided for the database.
A reporting tool like SSRS, Tableau, etc. needs to be chosen for the visualization.
One may opt for the Data Analytical language like R, for further use.
Finally, all this will help in developing the data ware house and reporting tool.
发布评论
评论(7)
我会看看是否可以尽力简洁地回答您的问题。
一个。如果您发现报告和监控正在损害生产系统和/或离线数据存储的性能。
b.如果您发现每次获取业务问题的答案都需要构建大量复杂的 SQL。
c.如果您发现每次更改事务架构时,您都必须返回并重新处理所有报告查询。
d.如果您想汇集多个来源的数据。
这些我都会一并回答。我不认为数据仓库是一项全有或全无的冒险。它只是一个简洁的短语,意思是“以允许您更轻松、更快速地回答业务问题的方式存储数据”。
事务数据库旨在与应用程序有效交互。数据仓库、数据集市、运营数据存储和报告表的构建是为了与人们有效地交互(如果有意义的话)。
好问题。如果您的交易系统使您能够充分了解您的业务,那么您可能不需要仓储。
如果您只有一个数据源并且性能不是问题,您可能可以通过创建简单的报告表来获得洞察力。
这是一个大问题,需要的篇幅远远超出我在这里分配的篇幅。
关于这一点,我可以向您指出一些可能提供您所寻求的见解的地方。
I'll see if I can do my best to answer your questions succinctly.
a. If you find that reporting and monitoring are impairing the performance of your production system and/or an offline data store.
b. If you find that getting answers to your business questions requires building a lot of complex SQL each time.
c. If you find that every time you make a change to your transactional schema, you have to go back and rework all of your reporting queries.
d. If you want to bring together data from multiple sources.
I'll answer these together. I wouldn't think of a data warehouse as an all or nothing venture. It's simply a concise phrase that means "storing your data in a way that allows you to more easily and quickly answer business questions."
Transactional databases are designed to efficiently interface with applications. Data warehouses, data marts, operational data stores and reporting tables are built to efficiently interface with people, if that makes sense.
Good question. If your transactional system provides you with sufficient insight into your business, you probably do not have a need for warehousing.
If you only have one source of data and performance is not a problem, you can probably gain insight from creation of simple reporting tables.
That's a big question that would take far more space than I'm allotted here.
On this one, I can point you to a few places that might provide the insight you seek.
数据仓库的主要目的是加速(简化)报告和分析。它能够以业务用户能想到的任何方式对数据进行切片和切分。
对于第一步 DW,您可以简单地实现 Kimball 星型模式并对其运行 SQL 查询。如果事实证明这仍然太慢,请开始考虑预先计算的聚合(多维数据集)。
与标准化数据库相比,针对数据仓库的信息切片和切块要简单得多。复制报表服务器将提高性能,但不会简化切片和切块。另请记住,DW 属于业务用户,因此他们可以随时提出各种切片/切块的想法 - IT 人员应该简单地提供使此类事情成为可能的环境。
如果您只是偶尔在操作系统上运行一些报告并且对性能感到满意,则不需要 DW。
我所有的经验都是在这样的系统中,业务用户无休止地抱怨报告速度慢和无法编写“复杂的查询”,而生产人员则抱怨数据库因报告而陷入困境。在所有情况下,一个简单的 Kimball 星形和一个带有缓存和快照的报表服务器就足够了。
The main purpose of a DW is to speed-up (simplify) reporting and analytic. It enables slicing and dicing of data in any way a business user can think of.
For a first step DW, you can simply implement a Kimball star schema and run SQL queries against it. If this proves to be still too slow, start thinking about pre-calculated aggregations (cubes).
The slicing and dicing of information against a DW is way simpler, than against a normalized DB. Replicated report server will improve performance, but will not simplify slicing and dicing. Also keep in mind that the DW belongs to business users, so it is up to them to come up with various slice/dice ideas at any time -- IT people should simply provide environment in which something like this is possible.
If you just run few reports from time-to-time on your operational system and are satisfied with performance, there is no need for DW.
All my experience is with systems where business users endlessly complain about slow reports and inability to write "complicated queries", while production people complain that the database gets bogged down due to reporting. In all cases a simple Kimball star and a report server with cache and snapshots were good enough.
当以下两个条件匹配时,您应该考虑构建数据仓库:
这实际上是你如何看待数据仓库的问题。在许多情况下,只要您能够坚持使用关系数据库管理系统,您就可以逐渐从带有一些报告的 OLTP 系统迁移到完整的数据仓库。首先可能是构建第一个事实表,并继续使用标准化表作为维度。然后向游戏添加更多事实、更多事实表或专用维度表。首先在同一个数据库(或所涉及系统的数据库之一)中,稍后可能会移动到单独的数据库。
除了使用专门的系统之外,完整的数据仓库(单独的数据库、星型模式)还提供了调整 select 语句的最佳选项。它还与 OLTP 系统完全分离。考虑架构设计,还要考虑 CPU、I/O 和内存等资源以及组织(例如新版本的调度)。当然,这是您可能不需要的大量工作。
它在上面的答案中:仅仅因为您有一些复杂的查询,并不意味着您应该构建 DWH,对于其他标准(如果它们是孤立的)也是如此。
这里不能提供太多,但建议:敏捷。 DWH 的要求很大程度上取决于用户看到的可能性。要求可能会发生变化。使用数据库进行自动化测试很痛苦,但在没有适当测试的情况下在生产系统中闲逛更糟糕。
You should consider building a data warehouse, when two of the following criteria match:
It's really the question what you consider a data warehouse. In many cases you can move gradually from OLTPs Systems with some reports to a full blown data warehouse, as long as you can stick to a relational database management system. First could be to build a first fact table, and keep using the normalized tables for dimension. Then adding more facts, more fact tables or dedicated dimension tables to the game. First in the same database (or one of the databases of the involved systems), possibly moving to a separate database later.
A full data warehouse (separate database, star schema) offers the best options for tuning select statements, apart from going to a specialized system. It is also cleanly decoupled from the OLTP system(s). Think schema design, but also resources like CPU, I/O and memory and organizational, like scheduling of new releases. Of course it is a lot of work which you possibly don't need.
It's in the answers above: just because you have a handfull of complex queries, doesn't mean you should build a DWH, same holds for the other criteria, if they come in isolation.
Can't offer much here, but the advice: go agile. The requirements for a DWH depend extremly on the possibilities the users see. There for requirements are likely to change. Automating tests with databases is a pain, but fooling around in a production system with no proper tests is worse.
当您发现在事务数据存储中执行报告和分析活动对两者都有害时,我会推荐使用数据仓库。
我在这里没有什么可提供的。我想说,保留交易和报告数据库对我来说似乎是明智的,无论您是否称其为仓库。数据挖掘可能是一项 CPU 密集型活动。
我在这里没有什么可提供的。
我想说,如果您不需要保留很长的历史记录,不需要对数据进行深入分析,并且您的报告需求仅限于时不时的临时查询,那么数据仓库也许不是必要的。
在我到来之前,我的雇主已经使用数据仓库很多年了,所以我无法讲述我到来之前的情况。
I'd recommend a data warehouse when you observed that performing reporting and analysis activities on the in the transactional data store was harmful to both.
I have nothing to offer here. I'd say that keeping the transactional and reporting databases seems sensible to me, regardless of whether you call it a warehouse or not. Data mining can be a very CPU intensive activity.
I have nothing to offer here.
I'd say that if you don't need to keep long history, aren't doing intensive analysis of the data, and your reporting needs are limited to an ad hoc query from time to time, then perhaps a data warehouse isn't necessary.
My employers have all used data warehouses for many years prior to my arrival, so I can't speak to what things were like before I arrived.
根据我的经验,开始考虑数据仓库的第一个迹象是当您拥有(或正在开发)事务数据库并且用户开始添加大量报告和数据历史记录要求时。这几乎总是如此。拥有一个单独的数据仓库或报告数据库总是比尝试设计一个处理最终用户始终存在的报告需求的事务系统更容易。在事务系统中存储历史记录(对于业务实体)会增加复杂性并使应该尽可能响应的数据库变得臃肿。
另一方面,我曾在大公司工作过,那里的许多团队都创建了数据仓库,因为感兴趣的数据分布在许多系统中,因此很难查询。问题在于,每个小组都创建了自己的数据仓库,因为公司中所有现有的仓库都没有正确的信息子集,或者有一个被认为是非最佳或不正确的数据模型。由于创建了更多难以比较的不同数据系统,情况变得更糟。
From my experience, the first sign for starting to think about data warehousing is when you have (or are developing) a transactional database and the users start adding lots of reporting and data history requirements. Which is pretty much always. It's always easier to have a separate data warehouse or reporting database than to try to design a transactional system that handles the reporting needs that end users always have. Storing history (for business entities) in a transactional system adds complexity and bloats a database that should be as responsive as possible.
On the flip side, I've been in large companies where many groups created data warehouses because data of interest was spread across many systems and was therefore difficult to query. The problem was that each group created their own data warehouse because all the existing warehouses in the company did not have the right subset of information, or had a data model that was regarded as non-optimal or incorrect. This made the situation worse by creating even more disparate data systems that were hard to compare.
如果长期使用“事务系统”,则可以考虑 DW。后来,他们意识到需要执行一些数据挖掘,以确定业务的不同数据模式。最后,在确定的数据模式的帮助下,人们希望帮助高层管理人员做出有利于公司的进一步决策。
建立数据仓库需要采取以下步骤:
DW could be considered if, one is using a ‘Transactional System’ from a long period. Later, they realize that they need to perform some data mining, to determine different data patterns of the business. And finally, with the help of the determined data patterns, one wants to help the top management to take further decisions in the benefit of the company.
Following steps needs to be taken up for building up a data ware house:
“我认为这就是为什么有些项目失败了?”
主要原因有五个:
"I think that why do some projects fail?"
There are five primary reasons: