SQL Server:批量加载的唯一键
我正在开发一个数据仓库项目,其中多个系统将数据加载到暂存区域以进行后续处理。每个表都有一个“loadId”列,它是“loads”表的外键,其中包含加载时间、用户帐户等信息。
目前,源系统调用存储过程来获取新的loadId,将 loadId 添加到将插入的每一行,然后调用第三个存储过程来指示加载已完成。
我的问题是,有什么方法可以避免将 loadId 传回源系统吗?例如,我想象我可以从 Sql Server 获取某种连接 ID,我可以用它来查找负载表中的相关 loadId。但我不确定Sql Server是否有一个连接特有的变量?
有谁知道吗?
谢谢,
I am working on a data warehousing project where several systems are loading data into a staging area for subsequent processing. Each table has a "loadId" column which is a foreign key against the "loads" table, which contains information such as the time of the load, the user account, etc.
Currently, the source system calls a stored procedure to get a new loadId, adds the loadId to each row that will be inserted, and then calls a third sproc to indicate that the load is finished.
My question is, is there any way to avoid having to pass back the loadId to the source system? For example, I was imagining that I could get some sort of connection Id from Sql Server, that I could use to look up the relevant loadId in the loads table. But I am not sure if Sql Server has a variable that is unique to a connection?
Does anyone know?
Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我假设源系统正在将插入写入/提交到源表中,并且多个加载不会同时运行...
如果是这样,请在开始加载之前让源加载调用存储过程 newLoadStarting()程序。此存储过程将更新加载表(创建新行,记录开始时间)
在 loadID 列上放置一个触发器,该触发器将从该表获取 max(loadID) 并插入为当前加载 id。
为了完整起见,您可以添加一个 endLoading() 过程,它设置结束日期并取消激活该特定加载。
如果您在同一个表中同时运行多个加载...停止这样做...这不是很有效率。
I assume the source systems are writing/committing the inserts into your source tables, and multiple loads are NOT running at the same time...
If so, have the source load call a stored proc, newLoadStarting(), prior to starting the load proc. This stored proc will update a the load table (creates a new row, records start time)
Put a trigger on your loadID column that will get max(loadID) from this table and insert as the current load id.
For completeness you could add an endLoading() proc which sets an end date and de-activates that particular load.
If you are running multiple loads at the same time in the same tables...stop doing that...it's not very productive.
本地临时表(带有一个井号#temp)对于会话来说是唯一的,将 ID 转储到其中然后从中选择
顺便说一句,这仅在您使用相同连接时才有效
a local temp table (with one pound sign #temp) is unique to the session, dump the ID in there then select from it
BTW this will only work if you use the same connection
最后,我采用了以下解决方案“模式”,与 Markus 的建议非常相似:
由于 loadId 现在具有所有这些行的值,因此它对源系统不再可见,并且可以在需要时启动另一个加载。
我还安排每个源系统都有自己的架构,这是它唯一可以看到的东西,也是登录时的默认值。视图和存储过程位于此模式中,但基础表位于包含所有源数据的“暂存”模式中。我通过命名约定确保不存在冲突。
工作起来就像一个魅力,包括只有更新了两个表才能完成加载的一个情况。
In the end, I went for the following solution "pattern", pretty similar to what Markus was suggesting:
Because loadId now has a value for all those rows, it is no longer visible to the source system and it can start another load if required.
I also arrange for each source system to have its own schema, which is the only thing it can see and is its default on logon. The view and the sproc are in this schema, but the underlying table is in a "staging" schema containing data across all the sources. I ensure there are no collisions through a naming convention.
Works like a charm, including the one case where a load can only be complete if two tables have been updated.