SSAS 中的自定义 ROLAP 数据源
我正在尝试从一堆二进制文件构建 OLAP 数据源,但我们当前的模型无法正常工作。 我们使用 SSAS 作为结果的分析/报告模型,但无法从 SQL 中获得我们想要的性能。
我们的主要限制是:
数据库非常大。 我们有包含数百万行的巨大维度表,以及几个较小的事实表(<1,000,000 行)。
我们有一个动态立方体。 B/C 事实表是动态构建的,并且通常(可能每天多次),设置多维数据集不会有任何巨大的开销。 当前立方体上的部署时间可能超过 24 小时,我们需要将性能提高几个数量级,而硬件无法满足我们的要求。
基本上,我们想要快速设置和部署,这本身并不适合使用 SQL Server 2005 的 SSAS,但我们想要使用 SSRS 进行报告,并且需要一个 OLAP 模型在 Excel 中进行分析,因此我们仍然希望如果可能,使用 SSAS 构建多维数据集。
SSAS 中用于快速部署的常见解决方案是 ROLAP,但我们在较大的 ROLAP 查询上遇到执行错误,而且我们也不喜欢将二进制数据转换为 SQL 并将其加载到多维数据集中所涉及的所有开销。
有人做过 SSAS 可以使用的自定义 OLAP 数据源吗? 我们希望创建自己的 ROLAP 引擎来直接查询二进制源文件。
I am trying to build an OLAP datasource from a bunch of binary files, and our current model just isn't working. We are using SSAS as our analysis / reporting model for our results, but aren't able to get the performance we want out of SQL.
Our main constraints are:
The database is very large. We have huge dimension tables with millions of rows, and several smaller fact tables (<1,000,000 rows).
We have a dynamic cube. B/C the fact tables are built dynamically, and often (possibly multiple times per day), there can't be any huge overhead in setting up the cube. Current deploy times on the cube can exceed 24 hours, and we need orders of magnitude increase in performance which hardware just isn't gonna give us.
Basically, we want a fast setup and deploy, which doesn't inherently lend itself to SSAS using SQL Server 2005, but we want to use SSRS for reporting and we want an OLAP model for analysis in Excel, so we'd still like to use SSAS to build the cube if possible.
The common solution in SSAS for a fast deploy is ROLAP, but we are getting execution errors on larger ROLAP queries, and we also don't like all the overhead involved in converting the binary data to SQL and loading it into the cube.
Has anyone done work on a custom OLAP datasource that SSAS can use? We are looking to create our own ROLAP engine that will query the binary source files directly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您需要一个低延迟多维数据集(即显示最新数据的多维数据集),则此类事物的规范架构如下:
针对多维数据集的查询将针对最新数据命中相对较小的 ROLAP 分区,针对历史数据命中 MOLAP 分区。 MOLAP 分区可以具有聚合。 该过程不断勾选前沿 ROLAP 分区并转换其前身。 AS 将保留旧分区并使用它,直到替换分区构建并上线为止。
如果这种类型的架构符合您的要求,您可以考虑这样做。
If you need a low latency cube (i.e. one showing up-to-date data) the canonical architecture for such things is thus:
Queries against the cube will hit the relatively small ROLAP partition for the most recent data and the MOLAP partitions for the historical data. The MOLAP partitions can have aggregations. The process keeps ticking the leading edge ROLAP partition over and converting its predecessor. AS will keep the older partition around and use it until the replacement partition is built and comes on line.
If this type of architecture will fit your requirements you could consider doing it this way.
感谢您的回复,奈杰尔。
我想我需要更好地解释一下。 我的源数据采用专有格式,而不是数据库,因此访问事实表本身需要相当多的时间。 然后,我们需要尽快(最好在几分钟内)部署多维数据集并具有快速的查询响应,即使在使用 SQL 的小型数据集上,我们目前也没有看到这一点。
因为多维数据集的结构是动态的,所以我们经常必须重建多维数据集的每个方面,事后我们不会引入新数据,因此将其部分划分为 MOLAP 并将其他部分划分为 ROLAP 并没有真正的帮助。 我们正在寻找“Process Full”的性能。
我们开始意识到我们无法使用 SQL 进行查询,并且想知道是否有人创建了分析服务(或任何 OLAP 工具)可以读取的自定义 ROLAP 数据源。
我们可以快速处理创建结果集; 我们只需要弄清楚如何从 SSAS 获取查询并将其反馈回这些结果。 我们实际上只是希望使用 SSAS 作为我们的系统和 Excel、SSRS 等之间的中介,而不是使用它来处理或聚合数据。
Thanks for the response, Nigel.
I guess I need to explain this a little better. My source data is in a proprietary format, not a database, so getting to the fact table itself is taking quite a bit of time. Then we need to deploy the cube as quickly as possible (preferably within minutes) and have fast query responses, which we are not currently seeing even on a small dataset using SQL.
Because the structure of the cube is dynamic, we often have to rebuild every aspect of the cube, we don't introduce new data after the fact, so partitioning parts of it as MOLAP and other parts of ROLAP doesn't really help. We are looking for performance on the "Process Full".
We are beginning to realize that we just can't use SQL for querying, and want to know if anyone has created a custom ROLAP datasource that analysis services (or any OLAP tool) can read.
We can handle creating the result sets quickly; we just need to figure out how to get the query from SSAS and feed it back those results. We’re really just looking to use SSAS as an intermediary between our system and Excel, SSRS, etc. rather than using it to process or aggregate the data.
您可以使用类似 R 的自制库吗(它支持 C 扩展)连接到您的数据集。 R 将为您提供相当大的灵活性来构建复杂的报告或数据预处理库。 它还具有 Excel 的接口。
这是与传统 DB/OLAP 模型有些不同的工具链,但您可以相当轻松地用 C 语言编写快速数据集加载器,并跳过加载到数据库的中间步骤。
Could you use something like R with a homebrew library (it supports C Extensions) to interface to your data sets. R would give you a fair amount of flexibility for building complex reports or data pre-processing libraries. It also has an interface to Excel.
This a somewhat different toolchain to the traditional DB/OLAP model but you could fairly easily write a fast dataset loader in C and skip the intermediate step of loading into the database.
我还没有任何运气。 我们要么构建自己的数据提供程序并为 excel 构建插件来模拟 olap 行为,要么使用 CLR 表值函数来模拟我们的数据源并以此为基础构建多维数据集。 我在 CLR 方面进行的一次尝试性能很糟糕,并且由于构建多维数据集时 SSAS 运行的查询量很大而失败了。 我正在等待在 SQL 08 环境中获得一台更新更快的机器,看看这是否可行。 祝你好运,斯科特。
I haven't had any luck yet. We are going the route of either building our own Data Provider and building add-ins for excel to emulate the olap behavior, OR using CLR table-valued functions to emulate our data-sources and build the cube off of that. The one attempt I took at the CLR stuff had horrible performance and blew up though due to the amount of queries SSAS runs when building a cube. I am waiting to get a newer faster machine in the SQL 08 environment to see if this is feasible. Good luck Scott.