从 Excel 查询 SQL 数据库的最快方法?

发布于 2024-12-13 11:39:15 字数 638 浏览 1 评论 0原文

我将从 Excel 中构建一些添加功能,以启用 让用户从 Excel 执行临时报告。

最明显的方法是通过针对 ODBC 数据源的 SQL 语句, 但我担心它不会很好地针对一个巨大的表进行扩展(即:对包含 50 GB 数据的表进行 SUM 或 COUNT 将导致性能问题)

我一直在考虑使用某种缓存技术,并且基本上在 RAM 中查询 数据:http://www.yellowfinbi.com/wp_inmemory.pdf 这对于 RAM 来说可能是昂贵的,而且我还不太确定它的实现有多困难。

我的约束如下:

  • 我应该能够从 Excel 查询任何类型的数据源(SQL Server、MySql)

  • 将从 Excel 中执行临时报告(通过我的加载项)

  • 查询结果应尽快发送回 Excel 单元格尽可能

  • 我不想构建Datawarehouse或Olap立方体,但查询数据源

对于算法、实现技术(In-Ram 报告?)有什么建议,以便以最优化的方式执行 Excel 的即席查询? (除了优化的 SQL 代码!)

I will be building a little adding from Excel that would enable
to let user perform ad-hoc reporting from Excel.

the obvious way would be via SQL statement against an ODBC DataSource,
but I'm afraid it would not scale really well against a huge table (ie: doing a SUM or a COUNT against a table containing 50 gigas of datas will lead to performance issue)

I've been thinking about using some sort of caching technique, and basically querying in RAM
data : http://www.yellowfinbi.com/wp_inmemory.pdf
This can be costly in RAM and I'm not too sure how difficult it's to implement yet.

My Constraints are as follow :

  • I should be able to query any type of Data Source (SQL Server, MySql) from Excel

  • The ad-hoc reporting will be performed from an Excel (via my add-in)

  • The query result should be send back to Excel cells as fast as possible

  • I don't want to build Datawarehouse or Olap cubes, but query the data source

Any suggestions on algorithms, implementation techniques (In-Ram reporting ?) to have the most optimized way to perform ad-hoc querying form Excel ? (beside theoptimized SQL code!)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

迷荒 2024-12-20 11:39:16

PowerPivot 是一个选项,它是 Excel 的免费插件,针对以这种方式聚合(求和/计数)数据进行了优化。

它确实需要足够的 RAM 来缓存来自服务器的数据。

PowerPivot is one option which is a free addin for Excel that is optimized for aggregating(summing/counting) data in this sort of way.

It does require sufficient RAM to cache the data from the server.

等风也等你 2024-12-20 11:39:16

“(即:对包含 50 GB 数据的表执行 SUM 或 COUNT 将导致性能问题)”

这正是您可能不想在客户端缓存数据的原因。除非每个用户都有自己的带有 64 GB RAM 的小型超级计算机,否则您需要减少遇到的行数。

事实上,大多数用户发现大型报告毫无用处。我们的大脑只能在短期记忆中保存不到十几个值。因此,用户将无法从大量数据中收集到任何有用的信息。

他们需要执行分析,通常分析涉及聚合或过滤数据

您可以使用其中之一或组合使用两个选项:

1) 实现预聚合数据的视图。许多数据库引擎都有类似物化视图或 SQL Server 中的无扩展选项之类的功能,基本上可以让您预先聚合报告。 如果您试图避免这种情况,并让您的插件处理这种情况,那么您基本上是在构建 OLAP 引擎,并且应该查看 OLAP 系统使用的算法(这正是 PowerPivot 已经具备的算法)。

这个想法是让数据库引擎做它擅长的事情,将行数减少到更接近客户端需求的聚合量。这可确保您不会通过网络发送大量数据并要求客户端处理该数据并在 RAM 中缓存。

2) 使用 WHERE criteria/SP 参数传递过滤条件,以减少仅返回绝对需要的行数。这里有发挥真正创意的空间,比大多数人意识到的要多。例如,也许您的表包含一些用户无权访问的数据,因此与授权表(或您用于访问控制的任何机制)连接并过滤掉他无权访问的任何数据是有意义的。

要求用户选择过滤器将数据过滤到合理的数量。

用户想要的数据超出了他们的承受能力,这是很常见的情况。

"(ie: doing a SUM or a COUNT against a table containing 50 gigas of datas will lead to performance issue)"

This is exactly why you probably don't want to cache the data at the client. Unless each user has his own small supercomputer with 64 gb of RAM, then you need to reduce the number of rows coming across.

The fact is most users find a large report to be pretty useless. Our brains can only keep less than a dozen values in short term memory. Thus a user isn't going to be able to gleen anything useful from a huge amount of data.

They need to perform analysis, and usually analysis involves aggregated or filtered data.

Two options you can use either or in combo:

1) Implement views that pre-aggregate the data. Many DB engines have something like materialized views or the no expand option in SQL server that basically let's you pre-aggregate reports. If you are trying to avoid this, and let your addin handle this, then you are basically building a OLAP engine, and should look at algorithms used by OLAP systems(this is exactly what PowerPivot is already).

The idea is you let the DB engine do what it's good at, reduce the number of rows to an aggregated amount that's closer to what the client needs. This ensures that you are not sending a huge amount of data over the network and requiring the client to process that data and cache in ram.

2) Pass filter criteria with the WHERE criteria/SP parameters to reduce the number of rows returned only to those absolutely needed. There is room to get really creative here, more than most people reallize. For example, maybe your table contains some data the user wouldn't have access to, so it makes sense to join with the authorization table(or whatever mechanism you use for access control) and filter out any data he does not have access to.

Require the user to choose filters to filter the data down to a reasonable amount.

It is a common occurrence that a user will want more data than they can swallow.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文