刷新报告数据库
目前,我们的项目有一个 OLTP sql server 2005 数据库。我们计划构建一个单独的报告数据库(非规范化),以便我们可以减轻 OLTP 数据库的负载。我不太确定同步这些数据库的最佳方法是什么。不过,我们并不是在寻找实时系统。 SSIS 是一个好的选择吗?我对 SSIS 完全陌生,所以不确定可行性。请提供您的意见。
We are currently having an OLTP sql server 2005 database for our project. We are planning to build a separate reporting database(de-normalized) so that we can take the load off from our OLTP DB. I'm not quite sure which is the best approach to sync these databases. We are not looking for a real-time system though. Is SSIS a good option? I'm completely new to SSIS, so not sure about the feasibility. Kindly provide your inputs.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
每个人对SSIS都有自己的看法。但我多年来一直将它用于数据集市和我当前的环境,这是一个完整的 BI 安装。我个人很喜欢它移动数据的能力,并且它仍然保持着在 30 分钟内移动 1.13 TB 的世界纪录。
至于设置,我们使用事务数据库中的日志传送来填充第二个框。然后使用SSIS对数据进行反规范化和入库。 SSIS 社区也非常大,并且有大量免费培训和有用的在线资源。
Everyone has there own opinion of SSIS. But I have used it for years for datamarts and my current environment which is a full BI installation. I personally love its capabilities to move data and it still is holding the world record for moving 1.13 terabytes in under 30 minutes.
As for setup we use log shipping from our transactional DB to populate a 2nd box. Then use SSIS to de-normalize and warehouse the data. The community for SSIS is also very large and there are tons of free training and helpful resources online.
我们使用 SSIS 构建数据仓库,并从中运行报告。它是一个很大的学习曲线,它抛出的错误并不是特别有用,它有助于擅长 SQL,而不是将其视为“逐行传输”——我的意思是你应该在中创建基于集合的查询sql 命令任务而不是使用大量 SSIS 组件和数据流任务。
了解每个仓库都是不同的,您需要决定如何做到最好。 此链接可能会给您一些好主意。
我们如何实现我们的(我们有一个 postgres 后端并使用 PGNP 提供程序,并且使用链接服务器可以让您的生活更轻松):
首先,您需要在每个表中都有一个时间戳列,以便您可以在最后一次查看时改变了。
然后编写一个查询,选择自上次运行包以来已更改的数据(使用审核表会有所帮助)并将该数据放入临时表中。我们将其作为数据流任务运行,因为(使用 postgres)我们没有任何其他选择,尽管您可以使用对另一个数据库的正常引用(dbname.schemaname.tablename 或类似的东西)或使用链接服务器查询。无论哪种方式,想法都是一样的。您最终会得到自查询以来发生变化的数据。
然后我们更新(基于id)已经存在的数据,然后插入新数据(通过左连接表来找出当前仓库中尚不存在的数据)。
现在我们有一个非规范化的表,在本例中显示每天的工作情况。由此,我们根据该表的聚合值计算其他表。
希望有所帮助,这里有一些我发现有用的好链接:
选择 .Net 或 SSIS
SSIS Talk
包配置
提高数据流的性能
转换
自定义日志记录/好博客
We build our data warehouse using SSIS from which we run reports. Its a big learning curve and the errors it throws aren't particularly useful, and it helps to be good at SQL, rather than treating it as a 'row by row transfer' - what I mean is you should be creating set based queries in sql command tasks rather than using lots of SSIS component and dataflow tasks.
Understand that every warehouse is difference and you need to decide how to do it best. This link may give you some good idea's.
How we implement ours (we have a postgres backend and use PGNP provider, and making use of linked servers could make your life easier ):
First of all you need to have a time-stamp column in each table so you can when it was last changed.
Then write a query that selects the data that has changed since you last ran the package (using an audit table would help) and get that data into a staging table. We run this as a dataflow task as (using postgres) we don't have any other choice, although you may be able to make use of a normal reference to another database (dbname.schemaname.tablename or somthing like that) or use a linked server query. Either way the idea is the same. You end up with data that has change since your query.
We then update (based on id) the data that already exists then insert the new data (by left joining the table to find out what doesn't already exist in the current warehouse).
So now we have one denormalised table that show in this case jobs per day. From this we calculate other tables based on aggregated values from this one.
Hope that helps, here are some good links that I found useful:
Choosing .Net or SSIS
SSIS Talk
Package Configurations
Improving the Performance of the Data Flow
Trnsformations
Custom Logging / Good Blog