将伪分区表合并到单个视图中
假设我们有 5 个表,
Fact_2011
Fact_2010
Fact_2009
Fact_2008
Fact_2007
每个表仅存储表名称扩展名指示的年份的交易。
然后,我们在每个表上创建一个单独的索引,并将“Year”列作为索引的第一列。
最后,我们创建一个视图 vwFact,它是所有表的并集:
SELECT * FROM Fact_2011
UNION
SELECT * FROM Fact_2010
UNION
SELECT * FROM Fact_2009
UNION
SELECT * FROM Fact_2008
UNION
SELECT * FROM Fact_2007
然后执行如下查询:
SELECT * FROM vwFact WHERE YEAR = 2010
或者在不太可能的情况下,
SELECT * FROM vwFact WHERE YEAR > 2010
这些查询与实际分区相比的效率如何按年份划分的数据还是基本相同?是否在每个伪分区表上都有一个按年份索引,以防止 SQL 引擎浪费大量时间来确定包含所查找记录之外的记录的物理表日期范围不值得扫描?或者这种伪分区方法正是 MS 分区(按年份)所做的吗?
在我看来,如果执行的查询是
SELECT Col1Of200 FROM vwFact WHERE YEAR = 2010
真实分区会有明显的优势,因为伪分区首先必须执行视图以从 Fact_2010
表中拉回所有列,然后进行过滤细化到最终用户选择的一列,而使用 MSSQL 分区,则更像是直接预先选择仅查找的列的数据。
评论?
Lets say we have 5 tables
Fact_2011
Fact_2010
Fact_2009
Fact_2008
Fact_2007
each of which stores only transactions for the year indicated by the extension of the table's name.
We then create a separate index over each of these tables with the column "Year" as the first column of the index.
Lastly, we create a view, vwFact
, which is the union of all of the tables:
SELECT * FROM Fact_2011
UNION
SELECT * FROM Fact_2010
UNION
SELECT * FROM Fact_2009
UNION
SELECT * FROM Fact_2008
UNION
SELECT * FROM Fact_2007
and then perform a queries like this:
SELECT * FROM vwFact WHERE YEAR = 2010
or in less likely situations,
SELECT * FROM vwFact WHERE YEAR > 2010
How efficient would these queries be compared to actually partitioning the data by Year or is it essentially the same? Is having an index by Year
over each of these pseudo partitioned tables what is needed to prevent the SQL engine from wasting more than a trivial amount of time to determine that a physical table that contains records outside of the sought date range is not worth scanning? Or is this pseudo partitioning approach exactly what MS partitioning (by year) is doing?
It seems to me that if the query executed is
SELECT Col1Of200 FROM vwFact WHERE YEAR = 2010
that real partitioning would have a distinct advantage, because the pseudo partitioning first has to execute the view to pull back all of the columns from the Fact_2010
table and then filter down to the one column that the end user is selecting, while with MSSQL partitioning, it would be more of a direct up front selection of only the sought column's data.
Comments?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我在 SQL Server 2000 上实现了分区视图,取得了巨大成功。
确保每个表都有一个检查约束,将年份列限制为年份。因此,在 Fact_2010 表上,它会是 Check Year = 2010,
然后还使视图 UNION ALL 不仅仅是 UNION
您可以使用执行计划来验证这一点
现在,当您查询一年的视图时,它应该只访问 1 个表,如果您不这样做, 如果没有检查约束,它将触及属于视图一部分的所有表
如果您有适当的约束,优化器足够智能,可以直接选择您需要的表
I have implemented partitioned views on SQL Server 2000 with great success
Make sure that you have a check constraint on each table that will restrict the year column to the year. So on the Fact_2010 table it would be Check Year = 2010
then also make the view UNION ALLs not just UNION
now when you query the view for one year it should just access 1 table, you can verify this with the execution plan
if you don't have the check constraints in place it will touch all the tables that are part of the view
If you have the constraints in place the optimizer is smart enough to just go the tables you need