对象数据库、商业智能和仓储
很抱歉,如果这看起来是一个新手问题,但我是数据仓库和商业智能世界的新手。
从我读到的内容来看,由于关系模型的限制,需要多维数据库。您需要使用多维数据库做的任何事情都可以在具有非常复杂的查询以及性能缓慢的连接和聚合操作的普通关系数据库上完成。
问题是,当我们谈论对象数据库的商业智能时,我们是否需要相同的概念(多维数据库-数据仓库等)?对象数据库没有联接,因为对象之间的关系是通过直接引用维护的。
Sorry if this seemed a novice question but I am new to the data warehousing and business intelligence world.
From what I have read I can see that a multidimensional database is needed due to limitations of the relational model. Any thing that you need to do with a multidimensional database can be done on an ordinary relational database with very complex queries and slow performance join and aggregation operations.
The question is do we need the same concepts (multidimensional database - data warehouse, etc) when we talk about business intelligence for object database? The object databases don't have joins because relations between objects are maintained by direct references.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
多维性是数据仓库的一个基本特征。
它不是关系模型局限性的“解决方法”。这是对数据进行建模的最佳方法,您需要对多个重要维度的事实进行任意“切片和切块”分析。
星型模式查询并不是很复杂。它们实际上非常简单,因为它们几乎总是采用
SELECT SUM(MEASURE) FROM FACT JOIN DIM1 ON ... JOIN DIM2 ON ... WHERE...
的形式。连接操作通常很慢。但是,连接可以通过面向对象的代码而不是 SQL 仓库来完成。
在许多情况下,大多数维度实际上相当小并且完全适合内存。分析查询可以转变为简单地获取所有事实,然后在内存中查找维度属性。
在其余情况下,我们有一个雪花模式,其中维度(通常是客户、患者或成员维度)几乎与相关事实表一样大。在这种情况下,数据库中的关系连接会很有帮助。
“对象数据库没有连接,因为对象之间的关系是通过直接引用维护的。”
不完全正确。对象数据库具有从对象到对象的导航。如果您检索一组对象及其相关对象,则实际上将执行连接操作。
“问题是,当我们谈论对象数据库的商业智能时,我们是否需要相同的概念(多维数据库 - 数据仓库等)?”
是的。多维性是必不可少的。绝对地。对象数据库将比关系数据库更好地表达这一点(或者可能更好)。然而,任何一个模型都必须代表测量及其维度的基本事实。
Multidimensionality is an essential feature of data warehousing.
It is not a "workaround" for limitations of the relational model. It is the best way to model data where you need to do arbitrary "slice and dice" analysis of facts with respect to multiple non-trivial dimensions.
Star-schema queries are not very complex. They're actually very simple, since they're almost always of the form
SELECT SUM(MEASURE) FROM FACT JOIN DIM1 ON ... JOIN DIM2 ON ... WHERE...
.Join operations are -- generally -- slow. However, the joins can be done in object-oriented code instead of a SQL warehouse.
In many cases, most dimensions are actually rather small and fit entirely in memory. The analysis queries can devolve to simple fetches of all the facts followed by in-memory lookups of dimensional attributes.
In the remaining cases, we have a snowflake schema where a dimension (usually a customer, patient or member dimension) is almost as large as the relevant fact table. In this case, a relational join in the database is helpful.
"The object databases don't have joins because relations between objects are maintained by direct references."
Isn't completely true. Object databases have navigation from object to object. If you retrieve a set of objects along with their related objects, a join operation will -- in effect -- have been performed.
"The question is do we need the same concepts (multidimensional database - data warehouse, etc) when we talk about business intelligence for object database?"
Yes. Multidimensionality is essential. Absolutely. An object database will represent this just as well (or perhaps better) than a relational database. Either model, however, must represent the essential truth of Measures and their Dimensions.
也许您最好看看所谓的文档数据库。 CouchDB 很流行、开源(免费获取和剖析)并且易于理解。 CouchDB 将所有数据存储为 JSON(易于解析的 JavaScript 对象表示法)文档,并仅使用REST(如果您不熟悉,则只需 HTTP)。 CouchDB 更有趣的功能之一是您可以使用 MapReduce 范例来选择数据来处理和聚合数据。
查看 CouchDB 可能会让您了解非关系数据库的一些可能性。要知道,CouchDB 主要关注存储数据文档而不是整个对象。有些数据库是真正的对象数据库,因为它们存储程序中给定对象的状态。比较 db4o。
Perhaps you might do well to look at so-called document databases. CouchDB is popular, open source (free to obtain and disect) and simple to understand. CouchDB stores all data as JSON (easy-to-parse JavaScript Object Notation) documents and communicates with the outside world using only REST (just HTTP if you're new to this). One of the more interesting CouchDB features is that you can select data using the MapReduce paradigm for processing and aggregating data.
Looking at CouchDB might give you an idea of what some of the possibilities are when it comes to non-relational databases. Know that CouchDB is primarily concerned with storing data documents rather than entire objects. Some databases are true object databases insofar as they store the state of a given object in a program. Compare db4o.