数据仓库注意事项:何时以及为何?

发布于 2024-08-16 13:00:27 字数 1431 浏览 4 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

最舍不得你 2024-08-23 13:00:27

我会看看是否可以尽力简洁地回答您的问题。

1.什么时候构建数据仓库是值得考虑的选择?
换句话说,什么迹象表明,
指标,或者我应该采用的其他标准
寻找这可能表明
标准交易
环境已经不够了吗?

一个。如果您发现报告和监控正在损害生产系统和/或离线数据存储的性能。

b.如果您发现每次获取业务问题的答案都需要构建大量复杂的 SQL。

c.如果您发现每次更改事务架构时,您都必须返回并重新处理所有报告查询。

d.如果您想汇集多个来源的数据。

2. 完整的数据仓库有哪些替代方案?
交易中的非规范化
数据库和沼泽标准
复制的“报告服务器”有两个
我想到的;有没有
其他我应该先探索一下
致力于 DW?

3.为什么数据仓库比上述替代方案更好?如果答案是,
“这取决于”,那么这取决于什么
开?

这些我都会一并回答。我不认为数据仓库是一项全有或全无的冒险。它只是一个简洁的短语,意思是“以允许您更轻松、更快速地回答业务问题的方式存储数据”。

事务数据库旨在与应用程序有效交互。数据仓库、数据集市、运营数据存储和报告表的构建是为了与人们有效地交互(如果有意义的话)。

4.什么时候不应该尝试构建数据仓库?我对此表示怀疑
任何被宣称为“最佳实践”的事情
与上下文无关。肯定有
一定是DW存在的一些场景
错误的选择 - 它们是什么?

好问题。如果您的交易系统使您能够充分了解您的业务,那么您可能不需要仓储。

如果您只有一个数据源并且性能不是问题,您可能可以通过创建简单的报告表来获得洞察力。

5.是否有任何我可以查看的系统的实际示例
通过引入数据进行改进
仓库?会的东西
向我解释一下,端到端,什么类型
他们需要的决策或分析
仓库,他们是如何决定的
放入什么以及如何放入
仓库最终融入了
更大的环境?我不想要一个
人为地“让我们用它做一个立方体
AdventureWorks 数据库”-
实施与我无关,
我对规格感兴趣
以及设计和总体思路
所涉及的流程。

这是一个大问题,需要的篇幅远远超出我在这里分配的篇幅。

关于这一点,我可以向您指出一些可能提供您所寻求的见解的地方。

  • Bruce Ullrey 所著的《实施数据仓库:有效的方法》一书记录了一个人构建数据仓库的旅程。它没有经过高度抛光,这使其更加真实。它读起来就像一本日记,里面有很多模型和其他视觉效果,很好地说明了他的努力。
  • Larissa Moss 的“商业智能路线图”。标准票价。引导您完成构建高级别 BI 实践的过程。
  • Steve Williams 的《商业智能的利润影响》提供了许多案例研究,显示了构建数据仓库的价值。

I'll see if I can do my best to answer your questions succinctly.

1.At what point is building a data warehouse an option worth considering?
In other words, what telltale signs,
metrics, or other criteria should I be
looking out for that might indicate
that a standard transactional
environment is no longer sufficient?

a. If you find that reporting and monitoring are impairing the performance of your production system and/or an offline data store.

b. If you find that getting answers to your business questions requires building a lot of complex SQL each time.

c. If you find that every time you make a change to your transactional schema, you have to go back and rework all of your reporting queries.

d. If you want to bring together data from multiple sources.

2.What are the alternatives to a full-on data warehouse?
Denormalization in the transactional
database and the bog-standard
replicated "report server" are two
that come to mind; are there any
others I should explore before
committing to the DW?

3.Why is a data warehouse better than said alternatives? If the answer is,
"it depends", then what does it depend
on?

I'll answer these together. I wouldn't think of a data warehouse as an all or nothing venture. It's simply a concise phrase that means "storing your data in a way that allows you to more easily and quickly answer business questions."

Transactional databases are designed to efficiently interface with applications. Data warehouses, data marts, operational data stores and reporting tables are built to efficiently interface with people, if that makes sense.

4.When shouldn't I attempt to build a data warehouse? I'm skeptical of
anything declared as a "best practice"
irrespective of context. Surely there
must be some scenarios where a DW is
the wrong choice - what are they?

Good question. If your transactional system provides you with sufficient insight into your business, you probably do not have a need for warehousing.

If you only have one source of data and performance is not a problem, you can probably gain insight from creation of simple reporting tables.

5.Are there any practical examples I could look at of systems that were
improved by introducing a data
warehouse? Something that would
explain to me, end-to-end, what sorts
of decisions or analysis they needed
the warehouse for, how they decided
what to put in it, and how the
warehouse ended up fitting into the
larger environment? I don't want a
contrived "let's make a cube out of
the AdventureWorks database" - the
implementation is irrelevant to me,
I'm interested in the specifications
and designs and overall thought
process that were involved.

That's a big question that would take far more space than I'm allotted here.

On this one, I can point you to a few places that might provide the insight you seek.

  • "Implementing A Data Warehouse: A Methodology that worked" by Bruce Ullrey is a book documenting one man's journey to building a data warehouse. It's not highly polished, which gives it more realism. It reads like a journal with lots of models and other visuals that illustrate his efforts pretty well.
  • "Business Intelligence Roadmap" by Larissa Moss. Standard fare. Walks you through the process of building a BI practice at a high level.
  • "The Profit Impact of Business Intelligence" by Steve Williams gives a number of case studies that show the value of building data warehouses.
勿挽旧人 2024-08-23 13:00:27
  1. 数据仓库的主要目的是加速(简化)报告和分析。它能够以业务用户能想到的任何方式对数据进行切片和切分。

  2. 对于第一步 DW,您可以简单地实现 Kimball 星型模式并对其运行 SQL 查询。如果事实证明这仍然太慢,请开始考虑预先计算的聚合(多维数据集)。

  3. 与标准化数据库相比,针对数据仓库的信息切片和切块要简单得多。复制报表服务器将提高性能,但不会简化切片和切块。另请记住,DW 属于业务用户,因此他们可以随时提出各种切片/切块的想法 - IT 人员应该简单地提供使此类事情成为可能的环境。

  4. 如果您只是偶尔在操作系统上运行一些报告并且对性能感到满意,则不需要 DW。

  5. 我所有的经验都是在这样的系统中,业务用户无休止地抱怨报告速度慢和无法编写“复杂的查询”,而生产人员则抱怨数据库因报告而陷入困境。在所有情况下,一个简单的 Kimball 星形和一个带有缓存和快照的报表服务器就足够了。

  1. The main purpose of a DW is to speed-up (simplify) reporting and analytic. It enables slicing and dicing of data in any way a business user can think of.

  2. For a first step DW, you can simply implement a Kimball star schema and run SQL queries against it. If this proves to be still too slow, start thinking about pre-calculated aggregations (cubes).

  3. The slicing and dicing of information against a DW is way simpler, than against a normalized DB. Replicated report server will improve performance, but will not simplify slicing and dicing. Also keep in mind that the DW belongs to business users, so it is up to them to come up with various slice/dice ideas at any time -- IT people should simply provide environment in which something like this is possible.

  4. If you just run few reports from time-to-time on your operational system and are satisfied with performance, there is no need for DW.

  5. All my experience is with systems where business users endlessly complain about slow reports and inability to write "complicated queries", while production people complain that the database gets bogged down due to reporting. In all cases a simple Kimball star and a report server with cache and snapshots were good enough.

指尖凝香 2024-08-23 13:00:27
  1. 当以下两个条件匹配时,您应该考虑构建数据仓库:

    • 海量数据
    • 许多大型复杂选择(可能与少数插入、更新和删除相比)执行时间太长(并且编写起来很复杂)
    • 需要合并来自不同系统的数据
  2. 这实际上是你如何看待数据仓库的问题。在许多情况下,只要您能够坚持使用关系数据库管理系统,您就可以逐渐从带有一些报告的 OLTP 系统迁移到完整的数据仓库。首先可能是构建第一个事实表,并继续使用标准化表作为维度。然后向游戏添加更多事实、更多事实表或专用维度表。首先在同一个数据库(或所涉及系统的数据库之一)中,稍后可能会移动到单独的数据库。

  3. 除了使用专门的系统之外,完整的数据仓库(单独的数据库、星型模式)还提供了调整 select 语句的最佳选项。它还与 OLTP 系统完全分离。考虑架构设计,还要考虑 CPU、I/O 和内存等资源以及组织(例如新版本的调度)。当然,这是您可能不需要的大量工作。

  4. 它在上面的答案中:仅仅因为您有一些复杂的查询,并不意味着您应该构建 DWH,对于其他标准(如果它们是孤立的)也是如此。

  5. 这里不能提供太多,但建议:敏捷。 DWH 的要求很大程度上取决于用户看到的可能性。要求可能会发生变化。使用数据库进行自动化测试很痛苦,但在没有适当测试的情况下在生产系统中闲逛更糟糕。

  1. You should consider building a data warehouse, when two of the following criteria match:

    • Huge amount of data
    • Many big complex selects (possibly compared to few inserts, updates, and deletes) that just take too long to execute (and are complicated to write)
    • Data from different systems needs to get combined
  2. It's really the question what you consider a data warehouse. In many cases you can move gradually from OLTPs Systems with some reports to a full blown data warehouse, as long as you can stick to a relational database management system. First could be to build a first fact table, and keep using the normalized tables for dimension. Then adding more facts, more fact tables or dedicated dimension tables to the game. First in the same database (or one of the databases of the involved systems), possibly moving to a separate database later.

  3. A full data warehouse (separate database, star schema) offers the best options for tuning select statements, apart from going to a specialized system. It is also cleanly decoupled from the OLTP system(s). Think schema design, but also resources like CPU, I/O and memory and organizational, like scheduling of new releases. Of course it is a lot of work which you possibly don't need.

  4. It's in the answers above: just because you have a handfull of complex queries, doesn't mean you should build a DWH, same holds for the other criteria, if they come in isolation.

  5. Can't offer much here, but the advice: go agile. The requirements for a DWH depend extremly on the possibilities the users see. There for requirements are likely to change. Automating tests with databases is a pain, but fooling around in a production system with no proper tests is worse.

因为看清所以看轻 2024-08-23 13:00:27

什么时候构建数据仓库是值得考虑的选择?换句话说,我应该注意哪些迹象、指标或其他标准,这些迹象、指标或其他标准可能表明标准事务环境不再足够?

当您发现在事务数据存储中执行报告和分析活动对两者都有害时,我会推荐使用数据仓库。

除了完整的数据仓库之外,还有哪些替代方案?事务数据库中的非规范化和 bog 标准复制“报告服务器”是我想到的两个;在加入 DW 之前我还应该探索其他什么吗?

我在这里没有什么可提供的。我想说,保留交易和报告数据库对我来说似乎是明智的,无论您是否称其为仓库。数据挖掘可能是一项 CPU 密集型活动。

为什么数据仓库比上述替代方案更好?如果答案是“这取决于”,那么它取决于什么?

我在这里没有什么可提供的。

什么时候我不应该尝试构建数据仓库?无论上下文如何,我对任何被称为“最佳实践”的东西都持怀疑态度。当然,在某些情况下 DW 是错误的选择 - 它们是什么?

我想说,如果您不需要保留很长的历史记录,不需要对数据进行深入分析,并且您的报告需求仅限于时不时的临时查询,那么数据仓库也许不是必要的。

是否有任何我可以查看的通过引入数据仓库改进系统的实际示例?可以端到端地向我解释他们需要仓库进行什么样的决策或分析,他们如何决定在其中放入什么,以及仓库最终如何适应更大的环境?我不想要一个人为的“让我们用 AdventureWorks 数据库制作一个立方体” - 实现与我无关,我对所涉及的规范和设计以及整体思维过程感兴趣。

在我到来之前,我的雇主已经使用数据仓库很多年了,所以我无法讲述我到来之前的情况。

At what point is building a data warehouse an option worth considering? In other words, what telltale signs, metrics, or other criteria should I be looking out for that might indicate that a standard transactional environment is no longer sufficient?

I'd recommend a data warehouse when you observed that performing reporting and analysis activities on the in the transactional data store was harmful to both.

What are the alternatives to a full-on data warehouse? Denormalization in the transactional database and the bog-standard replicated "report server" are two that come to mind; are there any others I should explore before committing to the DW?

I have nothing to offer here. I'd say that keeping the transactional and reporting databases seems sensible to me, regardless of whether you call it a warehouse or not. Data mining can be a very CPU intensive activity.

Why is a data warehouse better than said alternatives? If the answer is, "it depends", then what does it depend on?

I have nothing to offer here.

When shouldn't I attempt to build a data warehouse? I'm skeptical of anything declared as a "best practice" irrespective of context. Surely there must be some scenarios where a DW is the wrong choice - what are they?

I'd say that if you don't need to keep long history, aren't doing intensive analysis of the data, and your reporting needs are limited to an ad hoc query from time to time, then perhaps a data warehouse isn't necessary.

Are there any practical examples I could look at of systems that were improved by introducing a data warehouse? Something that would explain to me, end-to-end, what sorts of decisions or analysis they needed the warehouse for, how they decided what to put in it, and how the warehouse ended up fitting into the larger environment? I don't want a contrived "let's make a cube out of the AdventureWorks database" - the implementation is irrelevant to me, I'm interested in the specifications and designs and overall thought process that were involved.

My employers have all used data warehouses for many years prior to my arrival, so I can't speak to what things were like before I arrived.

筱果果 2024-08-23 13:00:27

根据我的经验,开始考虑数据仓库的第一个迹象是当您拥有(或正在开发)事务数据库并且用户开始添加大量报告和数据历史记录要求时。这几乎总是如此。拥有一个单独的数据仓库或报告数据库总是比尝试设计一个处理最终用户始终存在的报告需求的事务系统更容易。在事务系统中存储历史记录(对于业务实体)会增加复杂性并使应该尽可能响应的数据库变得臃肿。

另一方面,我曾在大公司工作过,那里的许多团队都创建了数据仓库,因为感兴趣的数据分布在许多系统中,因此很难查询。问题在于,每个小组都创建了自己的数据仓库,因为公司中所有现有的仓库都没有正确的信息子集,或者有一个被认为是非最佳或不正确的数据模型。由于创建了更多难以比较的不同数据系统,情况变得更糟。

From my experience, the first sign for starting to think about data warehousing is when you have (or are developing) a transactional database and the users start adding lots of reporting and data history requirements. Which is pretty much always. It's always easier to have a separate data warehouse or reporting database than to try to design a transactional system that handles the reporting needs that end users always have. Storing history (for business entities) in a transactional system adds complexity and bloats a database that should be as responsive as possible.

On the flip side, I've been in large companies where many groups created data warehouses because data of interest was spread across many systems and was therefore difficult to query. The problem was that each group created their own data warehouse because all the existing warehouses in the company did not have the right subset of information, or had a data model that was regarded as non-optimal or incorrect. This made the situation worse by creating even more disparate data systems that were hard to compare.

恬淡成诗 2024-08-23 13:00:27

如果长期使用“事务系统”,则可以考虑 DW。后来,他们意识到需要执行一些数据挖掘,以确定业务的不同数据模式。最后,在确定的数据模式的帮助下,人们希望帮助高层管理人员做出有利于公司的进一步决策。

建立数据仓库需要采取以下步骤:

  1. 数据库需要确定ETL平台和数据库。
  2. 需要选择SSRS、Tableau等报告工具来进行可视化。
  3. 人们可以选择 R 等数据分析语言来进一步使用。
  4. 最后,所有这些都将有助于开发数据仓库和报告工具。

DW could be considered if, one is using a ‘Transactional System’ from a long period. Later, they realize that they need to perform some data mining, to determine different data patterns of the business. And finally, with the help of the determined data patterns, one wants to help the top management to take further decisions in the benefit of the company.

Following steps needs to be taken up for building up a data ware house:

  1. An ETL platform and database needs to be decided for the database.
  2. A reporting tool like SSRS, Tableau, etc. needs to be chosen for the visualization.
  3. One may opt for the Data Analytical language like R, for further use.
  4. Finally, all this will help in developing the data ware house and reporting tool. 
风吹过旳痕迹 2024-08-23 13:00:27

“我认为这就是为什么有些项目失败了?”

主要原因有五个:

  • IT部门和业务用户之间缺乏合作;
  • 不正确的数据仓库架构;
  • 有经验的人不够;
  • 规划不当,例如未能使用经过验证的方法和计划来确保不遗漏任何细节;
  • 并取决于尖端技术。

"I think that why do some projects fail?"

There are five primary reasons:

  • lack of partnership between the IT department and business users;
  • incorrect data warehouse architecture;
  • not enough experienced people;
  • improper planning, such as failure to use a proven methodology and a plan to ensure that no details are omitted;
  • and depending on bleeding-edge technology.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文