如何使用SSIS归档相关数据(在不同的表中)
我们数据库中的许多表中分布着许多(较旧的)相关数据,需要存档到单独的数据库,然后从原始数据库中删除(可能每天)。归档(和删除)的内容由业务规则决定。我们不希望此存档中有较新的数据(它会更改并且被频繁访问)。如果需要并且可能需要报告,一些存档数据可能需要传回。
我需要在 SSIS 中提出一个高效且易于维护的解决方案。我可以想到至少三种方法来做到这一点:
- 编写一个包含大量左连接的大选择语句来获取我想要的所有数据,然后在内存中遍历这些数据或将其存储在
- 表上的 临时表中按表基础,选择作业,然后从其他表中选择相关数据
- ,与第一种方法类似,除了将数据转储到非规范化表中(似乎是一种懒惰的方法?)
我考虑过暂存表,但我不这样做在这种情况下看到一个好处 - 我可以使用时间戳(时间戳 = 进程启动)将数据直接复制到存档表,然后完成后,删除进程启动时带有时间戳的 id。一旦确定了作业 ID,我就知道需要存档的其余数据。
我正在寻找最佳方法,有人有其他方法吗? 其他人如何实现这一目标?
There is a lot of (older) related data spread over many tables in our database that needs archiving to a separate database then deleting from the original database (probably daily). What is archived (and deleted) is determined by business rules. We don't want newer data in this archive (it changes and is accessed frequently). Some of the archived data may need transferring back if required and possibly reported on.
I need to come up with an efficient and easy to maintain solution in SSIS. I can think of at least three ways of doing this:
- write a big select statement with lots of left joins to get all the data that I want, then go through this data either in memory or store it in a staging table
- work on a table by table basis, selecting the jobs, then the related data from other tables
- similar to the first method, except just dump the data into a de-normalised table (seems like a lazy method?)
I considered staging tables, though I don't see a benefit in this case - I can copy the data directly to the archive table with a timestamp (timestamp = process started) then when finished, go and delete the id's that were timestamped at the time the process started. Once I've identified the job id, I know the rest of the data that needs to be archived too.
I'm looking for the optimal approach, does anyone have another approach?
How would other people achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
听起来您需要两个进程,一个用于归档旧数据,一个用于重新加载归档数据。我会用以下方法来解决这两个问题。
对于归档数据:
对于重新加载数据,过程几乎相同,但从存档集工作到“实时”集。特别考虑的因素包括:
无论使用哪种工具(SSIS 或其他工具),此方法都适用。
Sounds like you need two processes, one to archive old data, and one to reload archived data. I’d tackle both with the following methodology.
For Archiving data:
For Reloading data, it’d be pretty much the same process, but working from the archive set to the “live” set. Special considerations would include:
This methodology would work regardless of tool—SSIS or otherwise.
您能说得更具体一些吗?
您是否需要当前数据库中仍然存在的“旧”数据?如果是 - 那么您可以简单地:
如果您不需要现有数据库中的“旧”数据,则插入所需的存档数据到新数据库将是必经之路。根据您在存档数据库中的需要 - 如果您不需要非键,您可以进行规范化,否则逐表方法会很好。
如果这有帮助,请标记为答案
Could you please be more specific?
Do you need the 'old' data still present in your current database? If yes - then you can simply:
If you don't need the 'old' data in the existing database, inserting the required archive data to new database will be the way to go. Depending what you need in the archive database - if you don't need the non-key you can go for normalization otherwise table by table approach will be good.
if this helps please mark as answer
使用 CQRS。
问题出在“相关数据”这个术语上。
如果您隔离读取(在自己的数据库/表中所有可能的读取),您将不需要相关数据,并且可以在必要时单独将规则应用于每个“聚合”。
假设对于列表视图,当您渲染分页器时,您可以从两个不同的源(实际数据和存档数据)获取计数。对于详细视图,您仍然可以为每个资源提供 UUID,但在这种情况下,应用程序将从不同的数据存储中读取。
这里的关键点是摆脱所有类型的连接。
Use CQRS.
The problem is in the term "related data".
If you segregate your reads (all possible reads in own database/tables) you will not have a need for related data and can apply rules to each "Aggregate" separately if necessary.
Let's say for list views you an get count from two different sources (actual and archived data) when you are rendering pager. For detail view you still can have UUID's for each resource but in this case application will read from different data stores.
The key point here is you get rid of all kind of joins.