星型架构设计

发布于 2024-07-05 23:32:19 字数 1450 浏览 14 评论 0 原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

逆光下的微笑 2024-07-12 23:32:19

我们需要解决三个问题。

1) 如何从可操作的源系统中获取数据,而不会对它们施加过度的压力,方法是连接它们内部和之间的表、在提取时清理数据、创建派生等。

2) 如何合并来自不同来源的数据 - 一些遗留数据、一些基于文件的、来自不同部门的集成、准确、高效存储的整体,对业务进行建模,并且不反映源系统的结构。 请记住,系统变化/替换相对较快,但业务的基本模型变化缓慢。

3) 如何构建数据以尽可能快速、准确地满足业务中特定人员/部门的特定分析和报告要求。

这三个截然不同的问题的解决方案需要不同的架构层来解决它们

Staging Layer
我们复制源的结构,但每晚只加载源中更改的数据。 一旦数据从暂存层进入下一层,数据就会被丢弃。 查询是带有简单 data_time 过滤器的单表查询。 对源的影响很小。

企业层
这是一个面向业务的第三范式数据库。 数据从暂存层提取(然后丢弃)到企业层,在那里进行清理、集成和规范化。

表示(星型模式)层
在这里,我们进行维度建模以满足特定要求。 数据被故意去规范化以减少连接数量。 企业层中可能占用多个表的层次结构会被折叠为单个维度表,并且多个事务表可能会合并为单个事实表。

你总是面临这三个问题。 如果你选择取消企业层,你仍然需要解决第二个问题,但是你必须在星型模式层中完成,在我看来,这是错误的地方。

There are three problems we need to solve.

1) How to get the data out of the operational source system without putting undue pressure on them by joining tables within and between them, cleaning data as we extract, creating derivations etc.

2) How to merge data from disparate sources - some legacy, some file based, from different departments into an integral, accurate, efficiently stored whole that models the business, and does not reflect the structures of the source systems. Remember, systems change / are replaced relatively quickly, but the basic model of the business changes slowly.

3) How to structure the data to meet specific analytical and reporting requirements for particular people/departments in the business as quickly and accurately as possible.

The solution to these three very different problems require different architectural layers to solve them

Staging Layer
We replicate the structures of the sources, but only changed data from the sources are loaded each night. once the data is taken from the staging layer into the next layer, the data is dropped. Queries are single table queries with a simple data_time filter. Very little effect on the source.

Enterprise Layer
This is a business oriented 3rd normal form database. Data is extracted (and afterward dropped) from the staging layer into the enterprise layer, where it is cleaned, integrated and normalised.

Presentation (Star Schema) Layer
Here, we model dimensionally to meet specific requirements. Data is deliberately de-normalise to reduce the number of joins. Hierarchies that may occupy several tables in the Enterprise Layer are collapsed into a single dimension tables, and multiple transactional tables may be merged into single fact tables.

You always face these three problems. If you choose to do away with the enterprise layer, you still have to solve the second problem, but you have to do it in the star schema layer, and in my view, this is the wrong place to do it.

七秒鱼° 2024-07-12 23:32:19

数据仓库文献中一直存在关于数据仓库架构中争论应采用>星型模式设计。

简而言之,Kimball 非常提倡在数据仓库中仅使用星型模式设计,而 Inmon 首先希望使用标准化 3NF 构建企业数据仓库a> 设计并随后在数据集市中使用星型模式设计。

除此之外,您还可以说雪花模式设计是另一种方法。

第四种设计可能是数据仓库建模方法。

There is an ongoing debate in the datawarehousing litterature about where in the datawarehouse-architecture the Star-Schema design should be applied.

In short Kimball advocates very highly for using only the Star-Schema design in the datawarehouse, while Inmon first wants to build an Enterprise Datawarehouse using normalized 3NF design and later use the Star-Schema design in the datamarts.

In addition here to you could also say that Snowflake schema design is another approach.

A fourth design could be the Data Vault Modeling approach.

下雨或天晴 2024-07-12 23:32:19

在数据仓库系统中使用星型模式会给您带来很多好处,并且在大多数情况下使用它是合适的它们为顶层。 您还可能有一个操作数据存储 (ODS) - 一种标准化结构,用于保存“当前状态”并促进数据构造等操作。 然而,在某些合理的情况下,这是不可取的。 我曾有机会构建带有和不带有 ODS 层的系统,并且在每种情况下都有选择架构的具体原因。

无需深入探讨数据仓库架构的微妙之处或引发 Kimball 与 Inmon 的激烈战争,星型模式的主要好处是:

  • 大多数数据库管理系统
    查询优化器中有设施
    进行“明星转变”
    使用位图索引结构或
    索引交集以实现快速
    谓词解析。 这意味着在解决选择之前,可以在不访问事实表(通常比索引大得多)的情况下从星型模式中进行选择。

  • 对星型模式进行分区相对简单,因为只需要事实表分区(除非你有一些符合圣经的大尺寸)。 分区消除 意味着查询优化器可以忽略那些不可能参与查询结果的分区,从而节省 I/O。

  • 缓慢变化的维度在星型模式上比在雪花型模式上更容易实现。

  • snowflake 或 ER 架构相比,该架构更容易理解,并且涉及的联接更少。 您的报告团队会因此而喜欢您

  • 星型模式更易于使用,并且(更重要的是)可以使用临时查询工具(例如业务对象报告生成器。 作为开发人员,您对这些工具生成的 SQL 几乎没有控制权,因此您需要为查询优化器提供尽可能多的帮助。 星型模式为查询优化器提供了相对较少的出错机会。

通常,您的报告层将使用星型模式,除非您有特定原因不这样做。 如果您有多个源系统,您可能需要实施操作数据存储使用标准化或雪花模式来积累数据。 这更容易,因为 ODS 通常不记录历史记录。 历史状态是在星型模式中跟踪的,这比使用规范化结构更容易做到。 标准化或雪花式操作数据存储反映“当前”状态,并且不保留数据固有的任何历史视图。

ODS 加载过程涉及数据清理和一致性,这通过规范化结构更容易实现。 一旦 ODS 中拥有干净的数据,维度和事实负载就可以通过通用或相对简单的机制相对简单地跟踪历史记录(随时间的变化); 使用星型模式更容易做到这一点,许多 ETL 工具(例如)提供了用于缓慢更改维度的内置设施,并且实现通用机制相对简单。

以这种方式对系统进行分层提供了职责分离 - 业务和数据清理逻辑在 ODS 中处理,星型模式加载处理历史状态。

Using star schemas for a data warehouse system gets you several benefits and in most cases it is appropriate to use them for the top layer. You may also have an operational data store (ODS) - a normalised structure that holds 'current state' and facilitates operations such as data conformation. However there are reasonable situations where this is not desirable. I've had occasion to build systems with and without ODS layers, and had specific reasons for the choice of architecture in each case.

Without going into the subtlties of data warehouse architecture or starting a Kimball vs. Inmon flame war the main benefits of a star schema are:

  • Most database management systems
    have facilities in the query optimiser
    to do 'Star Transformations' that
    use Bitmap Index structures or
    Index Intersection for fast
    predicate resolution. This means that selection from a star schema can be done without hitting the fact table (which is usually much bigger than the indexes) until the selection is resolved.

  • Partitioning a star schema is relatively straightforward as only the fact table needs to be partitioned (unless you have some biblically large dimensions). Partition elimination means that the query optimiser can ignore patitions that could not possibly participate in the query results, which saves on I/O.

  • Slowly changing dimensions are much easier to implement on a star schema than a snowflake.

  • The schema is easier to understand and tends to involve less joins than a snowflake or E-R schema. Your reporting team will love you for this

  • Star schemas are much easier to use and (more importantly) make perform well with ad-hoc query tools such as Business Objects or Report Builder. As a developer you have very little control over the SQL generated by these tools so you need to give the query optimiser as much help as possible. Star schemas give the query optimiser relatively little opportunity to get it wrong.

Typically your reporting layer would use star schemas unless you have a specific reason not to. If you have multiple source systems you may want to implement an Operational Data Store with a normalised or snowflake schema to accumulate the data. This is easier because an ODS typically does not do history. Historical state is tracked in star schemas where this is much easier to do than with normalised structures. A normalised or snowflaked Operational Data Store reflects 'current' state and does not hold a historical view over and above any that is inherent in the data.

ODS load processes are concerned with data scrubbing and conforming, which is easier to do with a normalised structure. Once you have clean data in an ODS, dimension and fact loads can track history (changes over time) with generic or relatively simple mechanisms relatively simply; this is much easier to do with a star schema, Many ETL tools (for example) provide built-in facilities for slowly changing dimensions and implementing a generic mechanism is relatively straightforward.

Layering the system in this way providies a separation of responsibilities - business and data cleansing logic is dealt with in the ODS and the star schema loads deal with historical state.

权谋诡计 2024-07-12 23:32:19

星型模式的特点是,它们是大多数人希望使用数据仓库完成的任务的自然模型。 例如,可以轻松生成不同粒度级别(例如月、日或年)的报告。 将典型业务数据插入星型模式也很有效,这也是数据仓库的常见且重要的功能。

您当然可以使用您想要的任何类型的数据库,但除非您非常了解您的业务领域,否则您的报告可能不会像使用星型模式时那样高效地运行。

The thing about star schemas is they are a natural model for the kinds of things most people want to do with a data warehouse. For instance it is easy to produce reports with different levels of granularity (month or day or year for example). It is also efficient to insert typical business data into a star schema, again a common and important feature of a data warehouse.

You certainly can use any kind of database you want but unless you know your business domain very well it is likely that your reports will not run as efficiently as they could if you had used a star schema.

漆黑的白昼 2024-07-12 23:32:19

星型模式非常适合数据仓库的最后一层。 如何到达那里是另一个问题。 据我所知,有两个大阵营,比尔·英蒙(Bill Inmon)和拉尔夫·金博尔(Ralph Kimball)阵营。 如果/当你决定选择明星时,你可能想看看这两个人的理论。

此外,一些报告工具非常喜欢星型模式设置。 如果您被锁定在特定的报告工具中,这可能会影响您仓库中报告集市的外观。

Star schemas are a natural fit for the last layer of a data warehouse. How you get there is another question. As far as I know, there are two big camps, those of Bill Inmon and Ralph Kimball. You might want to look at the theories of these two guys if/when you decide to go with a star.

Also, some reporting tools really like the star schema setup. If you are locked into a specific reporting tool, that might drive what the reporting mart looks like in your warehouse.

撩心不撩汉 2024-07-12 23:32:19

星型模式用于实现对大量数据的高速访问。 通过减少满足可能针对主题区域进行的任何查询所需的联接数量来实现高性能。 这是通过允许维度表中的数据冗余来完成的。

您必须记住,星型模式是仓库顶层的模式。 所有模型还涉及仓库堆栈底部的暂存模式,有些还包括持久转换的合并暂存区域,其中所有源系统都合并到 3NF 建模模式中。 各个学科领域都位于此之上。

顶层星型模式的替代方案包括一种变体,即雪花模式。 Dan Linstedt 提出的数据仓库建模是一种可能证实一些研究的新方法。

Star schemas are used to enable high speed access to large volumes of data. The high performance is enabled by reducing the amount of joins needed to satsify any query that may be made against the subject area. This is done by allowing data redundancy in dimension tables.

You have to remember that the star schema is a pattern for the top layer for the warehouse. All models also involve staging schemas at the bottom of the warehouse stack, and some also include a persistant transformed merged staging area where all source systems are merged into a 3NF modelled schema. The various subject areas sit above this.

Alternatives to star schemas at the top level include a variation, which is a snowflake schema. A new method that may bear out some investigation as well is Data Vault Modelling proposed by Dan Linstedt.

执着的年纪 2024-07-12 23:32:19

没有也可以。 然而,你会让自己的生活变得艰难——你的组织将希望使用位于数据仓库之上的标准工具,而这些工具将需要星型模式——将花费大量的精力来将方形钉子安装在圆形钉子上洞。

许多数据库级优化都假设您有星型模式; 您将花费大量时间进行优化和重组,以使数据库在您的不太明星的布局中做“正确的事情”。

确保利大于弊..

(听起来我以前去过那里吗?)

-D

It's possible to do without. However, you will make life hard for yourself -- your organization will want to use standard tools that live on top of DWs, and those tools will expect a star schema -- a lot of effort will be spent fitting a square peg in a round hole.

A lot of database-level optimizations assume that you have a star schema; you will spend a lot of time optimizing and restructuring to get the DB to do "the right thing" with your not-quite-star layout.

Make sure that the pros outweigh the cons..

(Does it sound like I've been there before?)

-D

甜`诱少女 2024-07-12 23:32:19

星型模式是关系数据库的一种逻辑数据模型,适合常规的数据仓库需求; 如果给定了关系环境,星形或雪花模式将是一个很好的设计模式,在许多 DW 设计方法中都是硬连线的。

然而,除了关系数据库引擎之外,还有其他引擎,它们可用于高效的数据仓库。 多维存储引擎对于 OLAP 任务可能非常快(例如 TM1); 在这种情况下我们不能应用星型模式设计。 需要特殊逻辑模型的其他示例包括 XML 数据库或面向列的数据库(例如实验性的 C-store ))。

Star schema is a logical data model for relational databases that fits the regular data warehousing needs; if the relational environment is given, a star or a snowflake schema will be a good design pattern, hard-wired in lots of DW design methodologies.

There are however other than relational database engines too, and they can be used for efficient data warehousing. Multidimensional storage engines might be very fast for OLAP tasks (TM1 eg.); we can not apply star schema design in this case. Other examples requiring special logical models include XML databases or column-oriented databases (eg. the experimental C-store)).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文