将伪分区表合并到单个视图中

发布于 2024-12-08 11:20:03 字数 910 浏览 1 评论 0原文

假设我们有 5 个表，

Fact_2011
Fact_2010
Fact_2009
Fact_2008
Fact_2007

每个表仅存储表名称扩展名指示的年份的交易。

然后，我们在每个表上创建一个单独的索引，并将“Year”列作为索引的第一列。

最后，我们创建一个视图 vwFact，它是所有表的并集：

SELECT * FROM Fact_2011
UNION
SELECT * FROM Fact_2010
UNION
SELECT * FROM Fact_2009
UNION
SELECT * FROM Fact_2008
UNION
SELECT * FROM Fact_2007

然后执行如下查询：

SELECT * FROM vwFact WHERE YEAR = 2010

或者在不太可能的情况下，

SELECT * FROM vwFact WHERE YEAR > 2010

这些查询与实际分区相比的效率如何按年份划分的数据还是基本相同？是否在每个伪分区表上都有一个按年份索引，以防止 SQL 引擎浪费大量时间来确定包含所查找记录之外的记录的物理表日期范围不值得扫描？或者这种伪分区方法正是 MS 分区（按年份）所做的吗？

在我看来，如果执行的查询是

SELECT Col1Of200 FROM vwFact WHERE YEAR = 2010

真实分区会有明显的优势，因为伪分区首先必须执行视图以从 Fact_2010 表中拉回所有列，然后进行过滤细化到最终用户选择的一列，而使用 MSSQL 分区，则更像是直接预先选择仅查找的列的数据。

评论？

原文

Lets say we have 5 tables

Fact_2011
Fact_2010
Fact_2009
Fact_2008
Fact_2007

each of which stores only transactions for the year indicated by the extension of the table's name.

We then create a separate index over each of these tables with the column "Year" as the first column of the index.

Lastly, we create a view, vwFact, which is the union of all of the tables:

SELECT * FROM Fact_2011
UNION
SELECT * FROM Fact_2010
UNION
SELECT * FROM Fact_2009
UNION
SELECT * FROM Fact_2008
UNION
SELECT * FROM Fact_2007

and then perform a queries like this:

SELECT * FROM vwFact WHERE YEAR = 2010

or in less likely situations,

SELECT * FROM vwFact WHERE YEAR > 2010

How efficient would these queries be compared to actually partitioning the data by Year or is it essentially the same? Is having an index by Year over each of these pseudo partitioned tables what is needed to prevent the SQL engine from wasting more than a trivial amount of time to determine that a physical table that contains records outside of the sought date range is not worth scanning? Or is this pseudo partitioning approach exactly what MS partitioning (by year) is doing?

It seems to me that if the query executed is

SELECT Col1Of200 FROM vwFact WHERE YEAR = 2010

that real partitioning would have a distinct advantage, because the pseudo partitioning first has to execute the view to pull back all of the columns from the Fact_2010 table and then filter down to the one column that the end user is selecting, while with MSSQL partitioning, it would be more of a direct up front selection of only the sought column's data.

Comments?

分享到QQ

分享到微博