SSIS ETL设计
SSIS ETL 设计
我正在尝试构建一个 ETL 作业。以下是我的主要设计考虑因素
- 一个用于跟踪数据拉取和状态的主表
表详细信息
表名称 - JobStatus
- 列 - JobId、StartTime、EndTime、状态
表名称 - JobDetails
- Job_Detail_id、JobId、TableName、StartTime、EndTime、状态
表名称 - ErrorLog
- ErrorLogid , ErrorDesc, Time, Comments
工作原理
每当作业失败时,作业详细信息表将有助于找出最后一个出错的表,ETL 作业可以从该步骤开始
SSIS 包将为源数据库和登台数据库之间的每个表定义数据流任务
工作原理示例
将条目放入作业状态表中以开始包执行
单个表的每个数据流任务都会在 JobDetails 表中放置一个条目。加载数据后,结束时间和状态将更新
错误状态已更新为错误,并且条目已记录在错误日志表中
我需要您对此的评论/反馈方法。任何与此示例相关的 ETL 设计书签/博客文章也会有所帮助
SSIS ETL Design
I'm trying to build a ETL job. Below are my primary design considerations
- One master table to track data pull and status
Table Details
Table Name - JobStatus
- Columns - JobId, StartTime, EndTime, Status
Table Name - JobDetails
- Job_Detail_id, JobId, TableName, StartTime, EndTime, Status
Table Name - ErrorLog
- ErrorLogid, ErrorDesc, Time, Comments
How it Works
Whenever job fails, Job details table would help to figure out which is the last errored table, ETL job can start from that step
SSIS package would have DataFlow Tasks defined for each table between source and staging DB
Example of how it works
Put Entry in Job Status Table to start package execution
Each Dataflow task for individual table would put an entry in JobDetails table. After Loading data End Time and Status would be updated
On Error Status updated to Error and entry logged in Errorlog table
I would need your comments / feedback for this approach. Any bookmarks/blogposts for ETL design relevant to this example also would be helpful
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您提到的从故障点重新启动包执行的要求已经存在于 SSIS 中。您可以使用检查点文件来实现该功能。内置的 SSIS 日志记录功能可以帮助您跟踪错误消息并帮助解决问题。
单击此处< /strong> 查看我在我的网站上写的关于您的问题的文章。我希望这能让您了解 SSIS 中已有的功能以及如何充分利用它。
我认为您可以花大量时间来微调您的核心业务功能,而不是重新发明大部分已经存在的流程。标准功能可能需要根据我们的要求进行一些细微的更改,但我认为您不需要从头开始。
希望有帮助。
I think that the requirements that you have mentioned about restarting the package execution from point of failure already exists in SSIS. You can use Checkpoint file to achieve that functionality. The in-built SSIS logging feature can help you to track the error messages and will help resolving the issue.
Click here to see an article that I have written on my website with respect to your question. I hope that might give you an idea about the functionality that already exist in SSIS and how you can make best use of it.
I think you can spend quality time on fine tuning your core business functionality than reinventing the process that already exists for the most part. The standard functionality might require some minor alterations based on our requirements but I don't think that you need to start from the ground up.
Hope that helps.