数据仓库的实际实现

发布于 2024-12-07 04:09:23 字数 487 浏览 2 评论 0原文

数据仓库现在似乎是一个大趋势,并且对我来说非常有趣。我正在尝试熟悉它的概念,但遇到了“只见树木、见森林”的问题,因为我可以在网上找到的所有数据仓库模型和描述都是理论,但是不要' t 给出了正在使用的实际技术的示例。我是一个情境学习者,所以抽象的理论解释并不能真正帮助我。

现在似乎有很多“数据仓库模型”,但它们似乎都有一些相似的特征。实际上有一个“ODS”(操作数据存储,它将来自多个源的数据聚合到同一个位置。然后,一个称为“ETL”的过程将该 ODS 中的数据转换为“数据仓库”,并且再次进入“数据”和/或“策略集市”。

有人可以提供用于每个组件的技术示例(ODS、ETL、数据仓库、数据/策略集市)吗

?可能只是任何普通的数据库,但数据仓库似乎发生了一些特殊的事情,因为这些“集市”使用它来提取数据,

这是迄今为止我最困惑的事情。 ?一个框架?一个算法?

我想一旦我看到了每一步发生的事情,我就会最终明白

Data warehousing seems to be a big trend these days, and is very interesting to me. I'm trying to acquaint myself with its concepts, and am having a problem "seeing the forest through the trees" because all of the data warehouse models and descriptions I can find online are theoretical, but don't gives examples with actual technologies being used. I'm a contextual learner, so abstracted, theoretical explanations don't really help me out all that much.

Now there seem to be many "data warehousing models", but all of them seem to have some similar characteristics. There is ually an "ODS" (operational data store that aggregates data from multiple sources into the same place. A process known as "ETL" then converts data in this ODS into a "data vault", and again into "data" and/or "strategy marts."

Can someone provide an example of the technologies that would be used for each of these components (ODS, ETL, data vault, data/strategy marts)?

It sounds like the ODS could just be any ordinary database, but the data vault seems to have some special things going on because it is used by these "marts" to pull data from.

ETL is the biggest thing I'm choking on by far. Is this a language? A framework? An algorithm?

I think once I see a concrete example of what's going on at each step of the way, I'll finally get it. Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

半透明的墙 2024-12-14 04:09:23

ETL 是一个过程。该缩写代表提取-转换-加载,描述了在此过程中对数据进行的处理。该过程可以在任何需要在具有不同数据格式的两个系统之间创建桥梁的地方实施。首先,您需要从源系统(数据库、平面文件、Web 服务等)中提取(提取)数据,然后处理(转换)数据以符合目标存储的格式(同样,它可能会有所不同:数据库、文件、API 调用)。在转换步骤中,可以对数据集执行进一步的操作,例如使用其他来源的数据进行丰富、清理和提高其质量。最后一步是将转换后的数据加载到目标存储中。

通常,ETL 过程用于加载数据仓库、在从遗留系统迁移到新系统期间将数据从一个系统或数据库迁移到另一个系统或数据库、在两个或多个系统之间同步数据。它还用作更广泛的 MDM 和 BI 解决方案中的中间层。

就具体软件而言,市场上有许多 ETL 工具,从 Informatica、IBM DataStage、Oracle Data Integrator 等大公司的强大解决方案,到 CloverETL、Talend 或 Pentaho 等更实惠的开源提供商。这些工具中的大多数都提供 GUI,其中通过图表定义数据流和处理。

ETL is a process. The abbreviation stands for Extract-Transform-Load which describes what is being done with data during the process. The process can be implemented anywhere where you need to create a bridge between two systems with differenet data formats. First, you need to pull (exract) data from a source system (database, flat files, web service etc.), Then data are being processed (transform) to comply with format of a target storage (again it can vary: databases, files, API calls). During the transform step, further actions can be performed on the data set as enrichment with data from other sources, cleansing and improving its quality. The last step is loading transformed data into a target storage.

Typically, an ETL process is employed for loading a datawarehouse, migrating data from one system or database to another during moving from a legacy system to new one, synchronizing data between two or more systems. It is also used as an intermediate layer in broader MDM and BI solutions.

In terms of specific software, there are many ETL tools on the market ranging from robust solutions from big players as Informatica, IBM DataStage, Oracle Data Integrator, to more affordable and open source providers as CloverETL, Talend, or Pentaho. The most of these tools offer a GUI where flow and processing of data is defined through diagrams.

飘落散花 2024-12-14 04:09:23

对于 Microsoft SQL Server 2005 及更高版本,ETL 工具称为 SSIS(SQL Server 集成服务)。如果您至少安装 SQL Server 标准版,您将获得 Business Intelligence Developer Studio 您可以用它来设计数据流。基本上,ETL 工具的作用是从一个或多个源(表、平面文件等)获取数据,然后对其进行转换(添加列、联接、过滤、映射到不同的数据类型等),最后再次将其存储到一个或多个表或文件。

要基本了解某些内容的工作原理,您可以观看此视频或< a href="http://midnightdba.itbookworm.com/VidPages/SSISGroundZero1/SSISGroundZero1.aspx" rel="nofollow">这个一个(均来自middotdba)。它们有点长,但你明白了。它们确实帮助我理解了 ETL 工具的基本功能。

不幸的是我还没有深入研究其他平台或工具。

For Microsoft SQL Server 2005 and later the ETL tool is called SSIS (SQL Server Integration Services). If you install at least the Standard version of the SQL Server you get the Business Intelligence Developer Studio with which you can design your data flows. Basically what an ETL tool does is take data from one or more sources (tables, flat files, ...) then transform it (add columns, join, filter, map to different data types, etc.) and finally store it again to one or more tables or files.

To get a basic understanding of how something works you can watch e.g. this video or this one (both from midnightdba). They're a bit lengthy, but you get an idea. They certainly helped me in understanding the basic functionality of an ETL tool.

Unfortunately I have not yet digged into other platforms or tools.

桃气十足 2024-12-14 04:09:23

我强烈建议您阅读 Ralph Kimball 和 Margy Ross 撰写的一些书籍(《数据仓库工具包》、《数据仓库生命周期工具包》),了解数据仓库的介绍。

我公司的数据仓库是使用 Oracle Warehouse Builder 工具进行 ETL 构建的。 OWB 是一个 GUI 工具,可在数据库上生成 PL/SQL 代码来操作数据。经过操作和清理后,数据将发布到 Oracle 数据集市。数据集市是一个数据库实例,用户可以通过 Oracle Discoverer(Java 软件)访问该实例以进行临时查询。

I'd highly recommend checking out some of the books by Ralph Kimball and Margy Ross (The Data Warehouse Toolkit, The Data Warehouse Lifecycle Toolkit) for an introduction to data warehousing.

My company's data warehouse is built using the Oracle Warehouse Builder tool for ETL. The OWB is a GUI tool that generates PL/SQL code on the database to manipulate the data. After manipulation and cleansing, the data is published to an Oracle datamart. The datamart is a database instance that users access for ad-hoc querying via Oracle Discoverer (Java software).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文