快速大数据旋转

发布于 2024-11-17 19:56:13 字数 440 浏览 3 评论 0原文

我们正在开发一种产品,可用于开发预测模型以及对数据进行切片和切块以提供 BI。

我们有两种数据访问要求。

对于预测建模,我们需要每天读取数据并逐行进行。在此情况下,普通的 SQL Server 数据库就足够了,我们没有遇到任何问题。

如果对大尺寸数据进行切片和切块,例如 1GB 的数据,我们可以说有 3 亿行。我们希望以最短的响应时间轻松地转换数据。

当前的 SQL 数据库存在响应时间问题。

我们希望我们的产品能够在任何具有 2GB RAM 和 Core 2 Duo 处理器的普通客户端计算机上运行。

我想知道应该如何存储这些数据,然后如何为每个维度创建旋转体验。

理想情况下,我们将拥有大公司销售人员按地区按产品划分的每日销售额数据。然后我们希望根据任何维度对其进行切片和切块,并且还能够执行聚合、唯一值、最大值、最小值、平均值和其他一些统计功能。

We are developing a product which can be used for developing predictive models and the slicing and dicing of the data in order to provide BI.

We are having two kind of data access requirements.

For predictive modeling, we need to read data on daily basis and do it row by row. In this the normal SQL Server database is sufficient and we are not getting any issues.

In case of slicing and dicing data of huge sizes like 1GB of data having let us say 300 M rows. We want to pivot that data easily with minimum response time.

The current SQL Database is having response time issues in this.

We like our product to run on any normal client machine with 2GB RAM with Core 2 Duo processor.

I would like to know how should I store this data and then how I can create a pivoting experience for each of the dimension.

Ideally we will have data of let us say daily sales by sales person by region by product for a large corporation. Then we would like to slice and dice it based on any dimension and also be able to perform aggregation, unique values, maximum, minimum, average values and some other statistical functions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

司马昭之心 2024-11-24 19:56:13

我会在这些数据的基础上构建一个内存中的多维数据集。举个例子,icCube 在单个设备上对超过 5000 万行的 3/4 测量具有亚秒级响应时间core i5 - 没有任何缓存或预聚合(即,该响应时间在所有维度上都是恒定的)。

直接联系我们,了解有关如何将其集成到您的产品中的更多详细信息。

I would build an in-memory cube on top of that data. To give you an example, icCube is having sub-second response time for 3/4 measures over 50M rows on a single core i5 - without any cache or pre-aggregation (i.e., this response time is constant in all the dimensions).

Contact us directly for more details about how to integrate it into your product.

ゞ花落谁相伴 2024-11-24 19:56:13

您还可以使用 PowerPivot 来执行此操作。这是 Excel 2010 的免费插件,可以处理大型数据集、切片+切块等。

如果您想围绕它编写代码,可以连接到 PowerPivot 数据库 (实际上是一个 SSAS立方体)使用 SSAS 数据库连接器

希望有一些用处。

You could also use PowerPivot to do this. This is a free addin for Excel 2010, which would allow large data sets to be handled, sliced+diced, etc.

If you want to code around it, you can connect to the PowerPivot database (effectively an SSAS cube) using the SSAS database connector

Hope that is of some use..

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文