在临时表上创建主键 - 何时？

发布于 2024-07-25 15:33:54 字数 859 浏览 7 评论 0原文

我有一个正在处理大量数据的存储过程。我已将该数据插入到临时表中。事件的整体流程类似于

CREATE #TempTable (
    Col1    NUMERIC(18,0) NOT NULL,    --This will not be an identity column.
    ,Col2   INT NOT NULL,
    ,Col3   BIGINT,

    ,Col4   VARCHAR(25) NOT NULL,
    --Etc...

    --
    --Create primary key here?
)


INSERT INTO #TempTable
SELECT ...
FROM MyTable
WHERE ...

INSERT INTO #TempTable
SELECT ...
FROM MyTable2
WHERE ...

--
-- ...or create primary key here?

我的问题是什么时候是在我的#TempTable表上创建主键的最佳时间？我推测我应该在插入所有内容后创建主键约束/索引数据，因为在创建主键信息时需要重新组织索引。但我意识到我的强调假设可能是错误的......

如果它是相关的，我使用的数据类型是真实的。在 #TempTable 表中，Col1 和 Col4 将构成我的主键。

更新：在我的例子中，我复制了源表的主键。我知道构成我的主键的字段将始终是唯一的。如果我在最后添加主键，我不担心更改表失败。

尽管如此，我的问题仍然是假设两者都会成功，哪个更快？

原文

I have a stored procedure that is working with a large amount of data. I have that data being inserted in to a temp table. The overall flow of events is something like

CREATE #TempTable (
    Col1    NUMERIC(18,0) NOT NULL,    --This will not be an identity column.
    ,Col2   INT NOT NULL,
    ,Col3   BIGINT,

    ,Col4   VARCHAR(25) NOT NULL,
    --Etc...

    --
    --Create primary key here?
)


INSERT INTO #TempTable
SELECT ...
FROM MyTable
WHERE ...

INSERT INTO #TempTable
SELECT ...
FROM MyTable2
WHERE ...

--
-- ...or create primary key here?

My question is when is the best time to create a primary key on my #TempTable table? I theorized that I should create the primary key constraint/index after I insert all the data because the index needs to be reorganized as the primary key info is being created. But I realized that my underlining assumption might be wrong...

In case it is relevant, the data types I used are real. In the #TempTable table, Col1 and Col4 will be making up my primary key.

Update: In my case, I'm duplicating the primary key of the source tables. I know that the fields that will make up my primary key will always be unique. I have no concern about a failed alter table if I add the primary key at the end.

Though, this aside, my question still stands as which is faster assuming both would succeed?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

提笔书几行 2024-08-01 15:33:54

这取决于很多。

如果在加载后对主键索引进行聚簇，则整个表将被重写，因为聚簇索引并不是真正的索引，而是数据的逻辑顺序。插入的执行计划将取决于计划确定时到位的索引，如果聚集索引到位，它将在插入之前排序。您通常会在执行计划中看到这一点。

如果将主键设置为简单约束，则它将是常规（非聚集）索引，并且表将按照优化器确定的任何顺序进行填充并更新索引。

我认为总体最快的性能（加载临时表的过程）通常是将数据写入堆，然后应用（非聚集）索引。

然而，正如其他人指出的那样，索引的创建可能会失败。此外，临时表并不是孤立存在的。大概有一个最佳索引可以从中读取下一步的数据。该索引需要就位或创建。这里是您必须在可靠性（首先应用 PK 和任何其他约束）和稍后的速度（如果您打算有一个聚簇索引）之间进行速度权衡的地方）。

回复收藏 0 原文

亣腦蒛氧 2024-08-01 15:33:54

如果数据库的恢复模式设置为简单或批量记录，则 SELECT ... INTO ... UNION ALL 可能是最快的解决方案。 SELECT .. INTO 是批量操作，批量操作的日志记录最少。

例如：

-- first, create the table
SELECT ...
INTO #TempTable
FROM MyTable
WHERE ...
UNION ALL
SELECT ...
FROM MyTable2
WHERE ...

-- now, add a non-clustered primary key:
-- this will *not* recreate the table in the background
-- it will only create a separate index
-- the table will remain stored as a heap
ALTER TABLE #TempTable ADD PRIMARY KEY NONCLUSTERED (NonNullableKeyField)

-- alternatively:
-- this *will* recreate the table in the background
-- and reorder the rows according to the primary key
-- CLUSTERED key word is optional, primary keys are clustered by default
ALTER TABLE #TempTable ADD PRIMARY KEY CLUSTERED (NonNullableKeyField)

否则，Cade Roux 在之前或之后提出了很好的建议。

If the recovery model of your database is set to simple or bulk-logged, SELECT ... INTO ... UNION ALL may be the fastest solution. SELECT .. INTO is a bulk operation and bulk operations are minimally logged.

eg:

-- first, create the table
SELECT ...
INTO #TempTable
FROM MyTable
WHERE ...
UNION ALL
SELECT ...
FROM MyTable2
WHERE ...

-- now, add a non-clustered primary key:
-- this will *not* recreate the table in the background
-- it will only create a separate index
-- the table will remain stored as a heap
ALTER TABLE #TempTable ADD PRIMARY KEY NONCLUSTERED (NonNullableKeyField)

-- alternatively:
-- this *will* recreate the table in the background
-- and reorder the rows according to the primary key
-- CLUSTERED key word is optional, primary keys are clustered by default
ALTER TABLE #TempTable ADD PRIMARY KEY CLUSTERED (NonNullableKeyField)

Otherwise, Cade Roux had good advice re: before or after.

回复收藏 0 原文

浮萍、无处依 2024-08-01 15:33:54

您也可以在插入之前创建主键 - 如果主键位于标识列上，那么无论如何插入都会按顺序完成，并且不会有任何区别。

回复收藏 0 原文

情栀口红 2024-08-01 15:33:54

比性能考虑更重要的是，如果您不能绝对、100% 确定会将唯一值插入到表中，请首先创建主键。否则主键创建失败。

这可以防止您插入重复/错误的数据。

回复收藏 0 原文

习惯成性 2024-08-01 15:33:54

如果您在创建表时添加主键，则第一次插入将是免费的（不需要检查）。第二次插入只需查看它是否与第一次不同。第三次插入必须检查两行，依此类推。检查将是索引查找，因为存在唯一约束。

如果在所有插入之后添加主键，则每行都必须与其他行匹配。所以我的猜测是，尽早添加主键会更便宜。

但也许 Sql Server 有一种非常聪明的方法来检查唯一性。因此，如果您想确定的话，请测量一下！

回复收藏 0 原文

望笑 2024-08-01 15:33:54

我想知道是否可以改进一个非常非常“昂贵”的存储过程，该存储过程需要在每次插入表时进行一系列检查，并找到了这个答案。在存储过程中，打开了多个临时表并相互引用。我将主键添加到 CREATE TABLE 语句中（即使我的选择使用 WHERE NOT EXISTS 语句来插入数据并确保唯一性），并且我的执行时间被严重缩短。我强烈建议使用主键。即使您认为不需要它，也至少要尝试一下。

回复收藏 0 原文