SQL Server:批量加载的唯一键

发布于 2024-09-08 06:06:35 字数 316 浏览 3 评论 0原文

我正在开发一个数据仓库项目,其中多个系统将数据加载到暂存区域以进行后续处理。每个表都有一个“loadId”列,它是“loads”表的外键,其中包含加载时间、用户帐户等信息。

目前,源系统调用存储过程来获取新的loadId,将 loadId 添加到将插入的每一行,然后调用第三个存储过程来指示加载已完成。

我的问题是,有什么方法可以避免将 loadId 传回源系统吗?例如,我想象我可以从 Sql Server 获取某种连接 ID,我可以用它来查找负载表中的相关 loadId。但我不确定Sql Server是否有一个连接特有的变量?

有谁知道吗?

谢谢,

I am working on a data warehousing project where several systems are loading data into a staging area for subsequent processing. Each table has a "loadId" column which is a foreign key against the "loads" table, which contains information such as the time of the load, the user account, etc.

Currently, the source system calls a stored procedure to get a new loadId, adds the loadId to each row that will be inserted, and then calls a third sproc to indicate that the load is finished.

My question is, is there any way to avoid having to pass back the loadId to the source system? For example, I was imagining that I could get some sort of connection Id from Sql Server, that I could use to look up the relevant loadId in the loads table. But I am not sure if Sql Server has a variable that is unique to a connection?

Does anyone know?

Thanks,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

向地狱狂奔 2024-09-15 06:06:35

我假设源系统正在将插入写入/提交到源表中,并且多个加载不会同时运行...

如果是这样,请在开始加载之前让源加载调用存储过程 newLoadStarting()程序。此存储过程将更新加载表(创建新行,记录开始时间)

在 loadID 列上放置一个触发器,该触发器将从该表获取 max(loadID) 并插入为当前加载 id。

为了完整起见,您可以添加一个 endLoading() 过程,它设置结束日期并取消激活该特定加载。

如果您在同一个表中同时运行多个加载...停止这样做...这不是很有效率。

I assume the source systems are writing/committing the inserts into your source tables, and multiple loads are NOT running at the same time...

If so, have the source load call a stored proc, newLoadStarting(), prior to starting the load proc. This stored proc will update a the load table (creates a new row, records start time)

Put a trigger on your loadID column that will get max(loadID) from this table and insert as the current load id.

For completeness you could add an endLoading() proc which sets an end date and de-activates that particular load.

If you are running multiple loads at the same time in the same tables...stop doing that...it's not very productive.

白云不回头 2024-09-15 06:06:35

本地临时表(带有一个井号#temp)对于会话来说是唯一的,将 ID 转储到其中然后从中选择

顺便说一句,这仅在您使用相同连接时才有效

a local temp table (with one pound sign #temp) is unique to the session, dump the ID in there then select from it

BTW this will only work if you use the same connection

怀中猫帐中妖 2024-09-15 06:06:35

最后,我采用了以下解决方案“模式”,与 Markus 的建议非常相似:

  • 我创建了一个带有 loadId 列的表,默认为 null(加上一些其他审核信息,如createdDate 和createdByUser);
  • 我在表上创建了一个视图,隐藏了loadId和audit列,只显示loadId为null的行;
  • 源系统将数据加载/查看到视图中,而不是表中;
  • 完成后,源系统调用“sp__loadFinished”过程,该过程将正确的值放入 loadId 列中并执行一些其他日志记录(接收的行数、调用的日期等)。我从模板生成它,因为它是重复的。

由于 loadId 现在具有所有这些行的值,因此它对源系统不再可见,并且可以在需要时启动另一个加载。

我还安排每个源系统都有自己的架构,这是它唯一可以看到的东西,也是登录时的默认值。视图和存储过程位于此模式中,但基础表位于包含所有源数据的“暂存”模式中。我通过命名约定确保不存在冲突。

工作起来就像一个魅力,包括只有更新了两个表才能完成加载的一个情况。

In the end, I went for the following solution "pattern", pretty similar to what Markus was suggesting:

  • I created a table with a loadId column, default null (plus some other audit info like createdDate and createdByUser);
  • I created a view on the table that hides the loadId and audit columns, and only shows rows where loadId is null;
  • The source systems load/view data into the view, not the table;
  • When they are done, the source system calls a "sp__loadFinished" procedure, which puts the right value in the loadId column and does some other logging (number of rows received, date called, etc). I generate this from a template as it is repetitive.

Because loadId now has a value for all those rows, it is no longer visible to the source system and it can start another load if required.

I also arrange for each source system to have its own schema, which is the only thing it can see and is its default on logon. The view and the sproc are in this schema, but the underlying table is in a "staging" schema containing data across all the sources. I ensure there are no collisions through a naming convention.

Works like a charm, including the one case where a load can only be complete if two tables have been updated.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文