SQL Server：当我总是要重新加入它们时，垂直分区有什么价值吗？

发布于 2024-09-14 06:12:31 字数 1440 浏览 3 评论 0原文

我面临着必须向已经有 32 列的表添加 64 个新列。举个例子：

Customers
(
    CustomerID int
    Name        varchar(50)
    Address     varchar(50)
    City        varchar(50)
    Region      varchar(50)
    PostalCode  varchar(50)
    Country     varchar(2)
    Telephone   varchar(20)

    ...
    NewColumn1  int null
    NewColumn2  uniqueidentifier null
    NewColumn3  varchar(50)
    NewColumn4  varchar(50)
    ...
    NewColumn64 datetime null

    ...
    CreatedDate datetime
    LastModifiedDate datetime
    LastModifiedWorkstation varchar(50)
    LastModifiedUser varchar(50)
)

大多数时候，这些新列中的大多数都将包含 null。

假设如果我将这 64 个新列垂直划分到一个新表中，那么每次我 SELECT from Customers:

SELECT ...
FROM Customers

都必须转换为联接获取分区值（即，在不需要新列的情况下永远不会获得性能增益）：

SELECT ...
FROM Customers
    INNER JOIN Customers_ExtraColumns
    ON Customers.CustomerID = Customers_ExtraColumns.CustomerID

因此这是对列进行分区的一个con 。

另一个缺点是我必须管理同时将行插入两个表中，而不是只插入一个表中。

我能想到的最后一个缺点是，SQL Server 现在必须在我想要访问“客户”时执行INNER JOIN。现在和永远都会浪费 CPU 和 I/O 来连接实际上是一个表的表 - 除了我决定将它们分开。

所以我的问题是：为什么我要把它们分开？

当 64 列大部分为空时，将它们垂直划分到一个单独的表中是否有任何价值？ Null 占用的空间很小......

有什么优点？

编辑：为什么我要考虑分区？它大部分是空数据，会使表中的列数增加三倍。当然一定很糟糕！

原文

i'm faced with having to add 64 new columns to table that already had 32 columns. For examples sake:

Customers
(
    CustomerID int
    Name        varchar(50)
    Address     varchar(50)
    City        varchar(50)
    Region      varchar(50)
    PostalCode  varchar(50)
    Country     varchar(2)
    Telephone   varchar(20)

    ...
    NewColumn1  int null
    NewColumn2  uniqueidentifier null
    NewColumn3  varchar(50)
    NewColumn4  varchar(50)
    ...
    NewColumn64 datetime null

    ...
    CreatedDate datetime
    LastModifiedDate datetime
    LastModifiedWorkstation varchar(50)
    LastModifiedUser varchar(50)
)

Most of the time the majority of these new columns will contain null.

It is also a given that if i vertically partition off these 64 new columns into a new table, then every time i SELECT from Customers:

SELECT ...
FROM Customers

will have to be converted to a join to get the partitioned values (i.e. there is never a performance gain to be had where i don't require the new columns):

SELECT ...
FROM Customers
    INNER JOIN Customers_ExtraColumns
    ON Customers.CustomerID = Customers_ExtraColumns.CustomerID

So that's one con to partitioning off the columns.

The other con is that i have to manage inserting rows into two tables simultaneously, rather than just one.

The final con i can think of is that SQL Server now has to perform an INNER JOIN any time i want to access "Customers". There will now and forever a waste of CPU and I/O to join tables that really are one table - except that i had decided to split them up.

So my question is: why would i split them up?

Is there any value in vertically partitioning out 64 columns to a separate table when they will mostly be null? Null take up very little space....

What are the pros?

Edit: Why am i even considering partitioning? It's mostly null data that will triple the number of columns in the table. Surely it must be bad!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

窝囊感情。 2024-09-21 06:12:32

如果这些值 a) 对记录是唯一的（给定客户应该只有一个值会出现在 NewColumn1 中），并且 b) 不被任何其他记录使用（至少，没有其他记录也需要基本客户）信息）我想说将它们保留为一张桌子。只是不要忘记在针对表编写的任何查询中命名特定列。

我有 EDI 背景，有时您必须处理每行包含 30 多列数据的平面文件。正如您提到的，NULL 不会占用太多空间，并且如果您永远将独立地抓取列（并且您将永远无法抓取独立的基本客户数据），我想说你是对的。

回复收藏 0 原文

念三年u 2024-09-21 06:12:32

答案是问题中省略的细节。列数无关紧要，重要的是数据的性质。

首先，请记住给定行
任何表都不能超过 8060
字节。所以如果新列是
大小使得该限制可以
理论上会超过，你会
已在其中安放了一颗定时炸弹
数据库。有时当它最少的时候
方便，数据插入或更新
将抛出错误和/或数据将
迷路。
为了防止这种情况，您可能需要
要使用多个表，只需
大多数版本的限制
SQL 服务器。
.
另一个重要的考虑因素是
数据建模。做新的专栏
与具有一对一的关系
客户ID？例如，说
眼睛颜色？
由于列数和
事实上你省略了他们的
名字，我怀疑
非标准化设计正在
考虑过。如果新列是
类似于 WebPage1，
WebPage2、WebPage3等，然后
这些需要分成
单独的，标准化表。
。