不同列中的大量重复 sum(x) 是否会使 Select 变慢？

发布于 2024-09-14 19:02:09 字数 302 浏览 10 评论 0原文

我有一个非常大的表，有几十列和很多行。我们将此表称为 FT。我每天都会运行一个脚本，从 FT 表中读取数据，执行一些计算，并更新用于生成报告的较小表（表 FA）。

更新 FA 的查询类似于：

INSERT INTO FA (A, B, C) 
    (SELECT sum(X), sum(x) * sum(y), sum(x) + sum(z)) group by..

由于我多次使用 sum(x)，如果我用 sum(x)、sum(y) 和 sum(z) 创建一个临时表并将其用于更新我的 FA 表吗？

原文

I have a really big table with dozens of columns and lots of rows. Lets call this table FT. Everyday I run a script that reads data from the FT table, perform some calculations, and update a smaller table (table FA) that I use to generate reports.

The query that update FA is something like:

INSERT INTO FA (A, B, C) 
    (SELECT sum(X), sum(x) * sum(y), sum(x) + sum(z)) group by..

As I use sum(x) a lot of times, will it be faster if I create a temporary table with sum(x), sum(y) and sum(z) and use it to update my FA table?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

天冷不及心凉 2024-09-21 19:02:09

我知道的每个数据库都经过了这种优化，因此这些值仅计算一次。

如果您不确定，请查看执行计划和当前查询的读取以及更改为临时表查询。

回复收藏 0 原文

若无相欠,怎会相见 2024-09-21 19:02:09

作为一般经验法则，从磁盘检索数据所花费的时间是数据库执行的最慢的操作（特别是在大型表上），

我希望相对直接的算术操作（例如这些）相比之下可以忽略不计。

回复收藏 0 原文

芯好空 2024-09-21 19:02:09

对您的查询进行基准测试：

insert into fa (a, b, c)
select sum_x, sum_x * sum_y, sum_x * sum_z
  from (select sum(x) as sum_x, sum(y) as sum_y, sum(z) as sum_z
          from my_table
         group by my_grouping_columns)

我强烈怀疑 Oracle 必须首先构建中间集（按分组的总和），然后将其转换为最终结果集（无论如何）。

强制Oracle将中间结果集物化为全局临时表肯定不会更容易或更快；您在没有充分理由的情况下添加了直接路径 I/O。也就是说，如果中间结果集的构建和在多次插入中使用的成本很高，则可能值得将其具体化到临时表中。

Benchmark your query against:

insert into fa (a, b, c)
select sum_x, sum_x * sum_y, sum_x * sum_z
  from (select sum(x) as sum_x, sum(y) as sum_y, sum(z) as sum_z
          from my_table
         group by my_grouping_columns)

My strong suspicion is that Oracle's got to build the intermediate set first regardless -- the sums as grouped by - and then transform that into the final result set, regardless.

It will definitely not be easier or faster to force Oracle to materialize the intermediate result set into a global temporary table; you're adding direct path I/O without having a good reason to do so. That said, if the intermediate result set is expensive to build and used in multiple inserts, it may be worthwhile to materialize it into a temporary table.

回复收藏 0 原文

半世晨晓 2024-09-21 19:02:09

考虑到您已使用 data-warehouse 和 datamart 标记了这篇文章，我只能假设您的 FT 表是某种事实，并且查询如下所示：

select 
    CalendarMonth
  , sum(x) as Tot_1 
  , sum(x) * sum(y) as Tot_2
  , sum(x) + sum(z) as Tot_3
from FT         as f
join dimDate    as d on d.DateKey    = f.DateKey
join dimUser    as u on u.UserKey    = f.UserKey
join dimProduct as p on p.ProductKey = f.ProductKey
where CalendarYear between 2008 and 2010
  and Country = 'United States'
  and ProductCategory = 'Cool Gadget'
  and UserGender = 'Female'
group by CalendarMonth ;

这正是事实表中度量聚合的样子。

现在，出于报告目的，您似乎有一个聚合表 (FA) 来加速报告速度。我只能猜测仓库是在夜间加载的，并且您的查询有时在早上工作时间之前准备聚合，因此它每天运行一次 - 或者至少应该如此。如果此查询运行时间太长，请考虑向聚合表 (FA) 添加一些关键字段（通常是 DateKey），然后定期更新 FA 表。

例如，如果您每天的销售额为 10,000 笔，则上述查询每月总计约 300,000 行。如果聚合表每天聚合，则每天更新表需要总计 10,000 行，而每月只需总计 30 行即可更新报表。

总而言之，为了加速事实聚合，查询应关注聚合的行数，而不是聚合函数。另外，请确保维度表在查询的 WHERE 子句中提到的列上有索引。

诚然，我在这里可能假设太多，所以这可能有帮助，也可能没有帮助。

Considering that you have tagged this post with data-warehouse and datamart, I can only assume that your FT table is some kind of a fact and that the query looks something like:

select 
    CalendarMonth
  , sum(x) as Tot_1 
  , sum(x) * sum(y) as Tot_2
  , sum(x) + sum(z) as Tot_3
from FT         as f
join dimDate    as d on d.DateKey    = f.DateKey
join dimUser    as u on u.UserKey    = f.UserKey
join dimProduct as p on p.ProductKey = f.ProductKey
where CalendarYear between 2008 and 2010
  and Country = 'United States'
  and ProductCategory = 'Cool Gadget'
  and UserGender = 'Female'
group by CalendarMonth ;

Which is exactly how an aggregation over measures in a fact table should look like.

Now, for reporting purposes, it seems that you have an aggregation table (FA) to speed-up reports. I can only guess that the warehouse is loaded over night and that your query prepares the aggregation sometimes in the morning, before business hours, so it runs once per day -- or at least is supposed to. If this query takes too long to run, consider adding few key fields to your aggregation table (FA) -- usually DateKey -- then update the FA table periodically.

For example, if you have 10,000 sales per day than the above query sums ~ 300,000 rows for each month. If the aggregation table is aggregated per day, than it takes sum of 10,000 rows once per day to update the table, and sum of only 30 rows per month for a report.

To summarize, in order to speed-up fact aggregation queries focus on number of rows that are aggregated -- not on aggregate functions. Also, make sure that dimension tables have indexes on columns mentioned in the WHERE clause of the query.

Admittedly, I may have assumed too much here, so this may or may not be helpful.

回复收藏 0 原文

~没有更多了~