SQL Server:当我总是要重新加入它们时,垂直分区有什么价值吗?
我面临着必须向已经有 32 列的表添加 64 个新列。举个例子:
Customers
(
CustomerID int
Name varchar(50)
Address varchar(50)
City varchar(50)
Region varchar(50)
PostalCode varchar(50)
Country varchar(2)
Telephone varchar(20)
...
NewColumn1 int null
NewColumn2 uniqueidentifier null
NewColumn3 varchar(50)
NewColumn4 varchar(50)
...
NewColumn64 datetime null
...
CreatedDate datetime
LastModifiedDate datetime
LastModifiedWorkstation varchar(50)
LastModifiedUser varchar(50)
)
大多数时候,这些新列中的大多数都将包含 null
。
假设如果我将这 64 个新列垂直划分到一个新表中,那么每次我 SELECT
from Customers:
SELECT ...
FROM Customers
都必须转换为联接获取分区值(即,在不需要新列的情况下永远不会获得性能增益):
SELECT ...
FROM Customers
INNER JOIN Customers_ExtraColumns
ON Customers.CustomerID = Customers_ExtraColumns.CustomerID
因此这是对列进行分区的一个con 。
另一个缺点是我必须管理同时将行插入两个表中,而不是只插入一个表中。
我能想到的最后一个缺点是,SQL Server 现在必须在我想要访问“客户”时执行INNER JOIN
。现在和永远都会浪费 CPU 和 I/O 来连接实际上是一个表的表 - 除了我决定将它们分开。
所以我的问题是:为什么我要把它们分开?
当 64 列大部分为空时,将它们垂直划分到一个单独的表中是否有任何价值? Null 占用的空间很小......
有什么优点?
编辑:为什么我要考虑分区?它大部分是空数据,会使表中的列数增加三倍。 当然一定很糟糕!
i'm faced with having to add 64 new columns to table that already had 32 columns. For examples sake:
Customers
(
CustomerID int
Name varchar(50)
Address varchar(50)
City varchar(50)
Region varchar(50)
PostalCode varchar(50)
Country varchar(2)
Telephone varchar(20)
...
NewColumn1 int null
NewColumn2 uniqueidentifier null
NewColumn3 varchar(50)
NewColumn4 varchar(50)
...
NewColumn64 datetime null
...
CreatedDate datetime
LastModifiedDate datetime
LastModifiedWorkstation varchar(50)
LastModifiedUser varchar(50)
)
Most of the time the majority of these new columns will contain null
.
It is also a given that if i vertically partition off these 64 new columns into a new table, then every time i SELECT
from Customers:
SELECT ...
FROM Customers
will have to be converted to a join to get the partitioned values (i.e. there is never a performance gain to be had where i don't require the new columns):
SELECT ...
FROM Customers
INNER JOIN Customers_ExtraColumns
ON Customers.CustomerID = Customers_ExtraColumns.CustomerID
So that's one con to partitioning off the columns.
The other con is that i have to manage inserting rows into two tables simultaneously, rather than just one.
The final con i can think of is that SQL Server now has to perform an INNER JOIN
any time i want to access "Customers". There will now and forever a waste of CPU and I/O to join tables that really are one table - except that i had decided to split them up.
So my question is: why would i split them up?
Is there any value in vertically partitioning out 64 columns to a separate table when they will mostly be null? Null take up very little space....
What are the pros?
Edit: Why am i even considering partitioning? It's mostly null data that will triple the number of columns in the table. Surely it must be bad!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果这些值 a) 对记录是唯一的(给定客户应该只有一个值会出现在 NewColumn1 中),并且 b) 不被任何其他记录使用(至少,没有其他记录也需要基本客户)信息)我想说将它们保留为一张桌子。只是不要忘记在针对表编写的任何查询中命名特定列。
我有 EDI 背景,有时您必须处理每行包含 30 多列数据的平面文件。正如您提到的,
NULL
不会占用太多空间,并且如果您永远将独立地抓取列(并且您将永远无法抓取独立的基本客户数据),我想说你是对的。If these values are a) unique to a record (a given customer should only have one value which would go in NewColumn1) and b) not used by any other record (at least, no other record that doesn't also require the base customer information) I'd say leave them as one table. Just don't forget to name your specific columns in any queries you write against the table.
I come from an EDI background, and sometimes you have to deal with flatfiles that contain 30+ columns of data per row. As you mention,
NULL
doesn't take up much room, and if you're NEVER going to be grapping the columns independently (and you'll never be able to grab the base customer data independently), I'd say you've got it right.答案是问题中省略的细节。列数无关紧要,重要的是数据的性质。
首先,请记住给定行
任何表都不能超过 8060
字节。所以如果新列是
大小使得该限制可以
理论上会超过,你会
已在其中安放了一颗定时炸弹
数据库。有时当它最少的时候
方便,数据插入或更新
将抛出错误和/或数据将
迷路。
为了防止这种情况,您可能需要
要使用多个表,只需
大多数版本的限制
SQL 服务器。
.
另一个重要的考虑因素是
数据建模。做新的专栏
与 具有一对一的关系
客户ID
?例如,说眼睛颜色
?由于列数和
事实上你省略了他们的
名字,我怀疑
非标准化设计正在
考虑过。如果新列是
类似于
WebPage1
,WebPage2
、WebPage3
等,然后这些需要分成
单独的,标准化表。
。
但是,如果这些列确实是唯一的项目,彼此无关并且与
CustomerID
(或该表的主键是什么)具有 1 对 1 的关系,并且大小限制不能被破坏了,那么把所有东西都放在一张桌子上就完全没问题了。The answer is in details that were omitted from the question. The number of columns is irrelevant, it is the nature of the data that matters.
First, remember that a given row in
any table can never exceed 8060
bytes. So if the new columns are
sized such that that limit can
theoretically be exceeded, you will
have built a time-bomb into the
database. Sometime when it is least
convenient, a data insert or update
will throw an error and/or data will
be lost.
To guard against this, you may need
to use more than one table, it's just
a limitation of most editions of
SQL-Server.
.
The other important consideration is
data-modeling. Do the new columns
have a one-to-one relationship with
CustomerID
? For example, sayeyeColor
?Because of the number of columns and
the fact that you omitted their
names, I suspect that a
non-normalized design is being
contemplated. If the new columns are
something like
WebPage1
,WebPage2
,WebPage3
, etc., thenthese need to be split into a
separate, normalized table.
.
But, if the columns really are unique items, unrelated to each other and with a 1-to-1 relationship to
CustomerID
(or whatever the primary-key of that table is), and the size limit cannot be busted, then having everything in one table is perfectly fine.为了数据模型的简单性,如果没有进一步的信息,我可能不会分区,但是您没有指出这些新列中数据的性质(也许某些列是应该标准化的数组)。
但是,有几点:
如果您进行垂直分区,并且对补充表有 FK 约束,这可能有助于消除某些情况下的联接,因为它知道将存在且只有一行。显然,它将在相同的唯一键上建立索引,这将有助于消除确定是否存在交叉联接的需要,因为只能有 0 或 1 行。
您可以拥有一个连接两个表的可更新视图,并在视图上有一个触发器,该触发器插入到连接的两个表中以形成视图。您还可以决定执行左连接,并且仅在需要补充行的任何列不为 NULL 时才创建补充行。
您还可以使用一组稀疏连接的补充数据表。显然,这也需要联接,但您也可以对多个补充表使用类似的技术,就像对 1 一样。
For simplicity of data model, without further information, I would probably not partition, but you haven't indicated the nature of the data in these new columns (perhaps some columns are arrays which should be normalized instead).
However, some points:
If you do vertically partition, and have a FK constraint on the supplemental table, that may help eliminate the join in some scenarios, since it knows that one and only one row will exist. Obviously it will be indexed on the same unique keys, which will help to eliminate the need to determine if there is a cross-join, since there can only be 0 or 1 rows.
You can have a single updatable view which joins the two tables and have a trigger on the view which inserts into the two tables joined to make the view. You could also decide to do a left join and only create a supplemental row at all if any of the columns needing it are non-NULL.
You can also use a sparsely joined set of tables of supplemental data. Obviously this would also need joins, but you could also use similar techniques with multiple supplemental tables as you would with 1.