跨数据库联合
我的仓库数据集市数据分布在同一服务器上的 3 个数据库中。这是概念验证项目,其中三个摘录已加载到各个数据库中。
在拉入一个多维数据集时,我实际上是在这样做:
SELECT * FROM DB1.dbo.Fact_Pres
UNION
SELECT * FROM DB2.dbo.Fact_Pres
UNION
SELECT * FROM DB3.dbo.Fact_Pres
我是否应该将数据合并到一个表中?这会让我的处理速度更快吗?
我对磁盘空间没有任何问题 - 我想实施最佳解决方案。
无论哪种情况,您能否帮助我理解为什么您建议的方法是最佳的?
My warehouse datamart data is split across 3 databases on the same server. This is Proof-Of-Concept project with three extracts that were loaded into individual databases.
While pulling into a cube, I am essentially doing this:
SELECT * FROM DB1.dbo.Fact_Pres
UNION
SELECT * FROM DB2.dbo.Fact_Pres
UNION
SELECT * FROM DB3.dbo.Fact_Pres
Should I actually consolidate the data into ONE table? Will that make my processing any faster?
I have no issues with disk space - I want to implement the best solution.
In either case, can you help me understand why the method you suggest would be optimal?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
考虑使用 SQL Server 的一流分区来联合表(而不是自己进行)。如果您总是选择每个数据点,那么也许是的,拉出多个磁盘会更快。
但为什么要有多个数据库呢?您始终可以将三个表堆叠到一个表中,但将该表实现在三个 RAID 驱动器之上。如果您追求的是速度,这是一个更明确的解决方案。
仅当您选择集合中特定的相邻部分时,联合才有意义。但根据你的OP,你选择了一切,所以这就消除了这种好处。
Consider federating your table using SQL Server's first class partitioning (as opposed to doing it yourself). If you're always select every data point, then maybe yeah, pulling off multiple disks is faster.
But why have multiple databases? You could always stack the three tables into one table, but have that one table implemented atop three drives RAIDed together. This is a more clear cut solution if what you're after is speed.
The federating only makes sense if you're ever selecting specific, adjacent parts of the set. But according to your OP, you're selecting everything, so that eliminates that benefit.
是的,你绝对应该这样做。将同一个表拆分到不同的数据库中是没有意义的。如果您遇到硬盘空间问题,请考虑对表进行分区。
关于您的评论:
性能成本并不是那么大,但是联合执行合并连接,这会带来一点开销。
除此之外,您确定您正确使用了 UNION 吗? UNION 将消除重复值。也许您真正想做的是 UNION ALL?
Yes, you definitely should. There is no point in splitting the same table in different databases. If you have problems with hard disk space, think about partitioning your table.
Regarding your comment:
The performance cost isn't THAT huge, but a union performs a merge join, which brings a bit ov an overhead.
In addition to that, are you sure you are using UNION correctly? UNION will eliminate duplicate values. Maybe what you really want to do is UNION ALL?
跨数据库查询(稍微)比数据库内查询慢,如果您需要三个单独的表,建议在同一数据库内使用不同的模式。如果这恰好是一个事实表,那么最好加载到单个事实表中,并在大小恰好太大时使用分区。
当谈到 ETL 时,建议将 ETL 转换为同一数据库中单独模式(如 ETL)中的暂存表,并从那里加载事实表。完成后,您将截断临时表。
这些建议来自 Microsoft Project Real。
另请记住,外键不能跨数据库使用。
The cross-database query is (somewhat) slower than than in-database, If you need three separate tables, it is advisable to use different schemas inside the same db. If this happens to be one fact table, then it is best to load into a a single fact table and use partitioning if the size happens to be too big.
When it comes to ETL, the suggestion is to ETL into staging tables in a separate schema -- like ETL -- within same DB and load the fact table from there. Once done, you truncate staging tables.
The recommendations are from the Microsoft Project Real.
Also keep in mind that foreign key can not be used across databases.
UNION
本质上执行select unique
,以便它可以删除重复的记录。这本身(可能)会损害性能。您可以通过将其更改为UNION ALL
来纠正此问题。除此之外,您应该检查执行计划,看看您获得了什么样的性能影响。我确实知道 SQL Server 甚至会在其他数据库中的表上使用索引,但是您在这里所做的仍然没有多大意义。我建议使用表分区并将其全部保存在一个数据库中,如果正常使用场景是将所有表合并在一起,则几乎没有充分的理由将它们分离到不同的数据库中。
A
UNION
essentially performs aselect distinct
so that it can remove duplicate records. This will (potentially) impair performance all by itself. You could rectify that by changing it toUNION ALL
.Beyond that, you should check the execution plan to see what kind of performance hit you're getting. I do know that SQL Server will use indexes even on tables in other databases, but what you're doing here still doesn't make a lot of sense. I'd suggest using table partitioning and keeping it all in one database if the normal usage scenario is to
UNION
all of the tables together, there is little if any good reason to separate them into different databases.