SQL Server:当我总是要重新加入它们时,垂直分区有什么价值吗?

发布于 2024-09-14 06:12:31 字数 1440 浏览 3 评论 0原文

我面临着必须向已经有 32 列的表添加 64 个新列。举个例子:

Customers
(
    CustomerID int
    Name        varchar(50)
    Address     varchar(50)
    City        varchar(50)
    Region      varchar(50)
    PostalCode  varchar(50)
    Country     varchar(2)
    Telephone   varchar(20)

    ...
    NewColumn1  int null
    NewColumn2  uniqueidentifier null
    NewColumn3  varchar(50)
    NewColumn4  varchar(50)
    ...
    NewColumn64 datetime null

    ...
    CreatedDate datetime
    LastModifiedDate datetime
    LastModifiedWorkstation varchar(50)
    LastModifiedUser varchar(50)
)

大多数时候,这些新列中的大多数都将包含 null

假设如果我将这 64 个新列垂直划分到一个新表中,那么每次我 SELECT from Customers:

SELECT ...
FROM Customers

都必须转换为联接获取分区值(即,在不需要新列的情况下永远不会获得性能增益):

SELECT ...
FROM Customers
    INNER JOIN Customers_ExtraColumns
    ON Customers.CustomerID = Customers_ExtraColumns.CustomerID

因此这是对列进行分区的一个con

另一个缺点是我必须管理同时将行插入两个表中,而不是只插入一个表中。

我能想到的最后一个缺点是,SQL Server 现在必须在我想要访问“客户”时执行INNER JOIN。现在和永远都会浪费 CPU 和 I/O 来连接实际上是一个表的表 - 除了我决定将它们分开。

所以我的问题是:为什么我要把它们分开?

当 64 列大部分为空时,将它们垂直划分到一个单独的表中是否有任何价值? Null 占用的空间很小......

有什么优点?

编辑:为什么我要考虑分区?它大部分是空数据,会使表中的列数增加三倍。 当然一定很糟糕!

i'm faced with having to add 64 new columns to table that already had 32 columns. For examples sake:

Customers
(
    CustomerID int
    Name        varchar(50)
    Address     varchar(50)
    City        varchar(50)
    Region      varchar(50)
    PostalCode  varchar(50)
    Country     varchar(2)
    Telephone   varchar(20)

    ...
    NewColumn1  int null
    NewColumn2  uniqueidentifier null
    NewColumn3  varchar(50)
    NewColumn4  varchar(50)
    ...
    NewColumn64 datetime null

    ...
    CreatedDate datetime
    LastModifiedDate datetime
    LastModifiedWorkstation varchar(50)
    LastModifiedUser varchar(50)
)

Most of the time the majority of these new columns will contain null.

It is also a given that if i vertically partition off these 64 new columns into a new table, then every time i SELECT from Customers:

SELECT ...
FROM Customers

will have to be converted to a join to get the partitioned values (i.e. there is never a performance gain to be had where i don't require the new columns):

SELECT ...
FROM Customers
    INNER JOIN Customers_ExtraColumns
    ON Customers.CustomerID = Customers_ExtraColumns.CustomerID

So that's one con to partitioning off the columns.

The other con is that i have to manage inserting rows into two tables simultaneously, rather than just one.

The final con i can think of is that SQL Server now has to perform an INNER JOIN any time i want to access "Customers". There will now and forever a waste of CPU and I/O to join tables that really are one table - except that i had decided to split them up.

So my question is: why would i split them up?

Is there any value in vertically partitioning out 64 columns to a separate table when they will mostly be null? Null take up very little space....

What are the pros?

Edit: Why am i even considering partitioning? It's mostly null data that will triple the number of columns in the table. Surely it must be bad!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

窝囊感情。 2024-09-21 06:12:32

如果这些值 a) 对记录是唯一的(给定客户应该只有一个值会出现在 NewColumn1 中),并且 b) 不被任何其他记录使用(至少,没有其他记录也需要基本客户)信息)我想说将它们保留为一张桌子。只是不要忘记在针对表编写的任何查询中命名特定列。

我有 EDI 背景,有时您必须处理每行包含 30 多列数据的平面文件。正如您提到的,NULL 不会占用太多空间,并且如果您永远将独立地抓取列(并且您将永远无法抓取独立的基本客户数据),我想说你是对的。

If these values are a) unique to a record (a given customer should only have one value which would go in NewColumn1) and b) not used by any other record (at least, no other record that doesn't also require the base customer information) I'd say leave them as one table. Just don't forget to name your specific columns in any queries you write against the table.

I come from an EDI background, and sometimes you have to deal with flatfiles that contain 30+ columns of data per row. As you mention, NULL doesn't take up much room, and if you're NEVER going to be grapping the columns independently (and you'll never be able to grab the base customer data independently), I'd say you've got it right.

念三年u 2024-09-21 06:12:32

答案是问题中省略的细节。列数无关紧要,重要的是数据的性质。

  • 首先,请记住给定行
    任何表都不能超过 8060
    字节
    。所以如果新列是
    大小使得该限制可以
    理论上会超过,你会
    已在其中安放了一颗定时炸弹
    数据库。有时当它最少的时候
    方便,数据插入或更新
    将抛出错误和/或数据将
    迷路。

    为了防止这种情况,您可能需要
    要使用多个表,只需
    大多数版本的限制
    SQL 服务器。
    .

  • 另一个重要的考虑因素是
    数据建模。做新的专栏
    与 具有一对一的关系
    客户ID?例如,说
    眼睛颜色

    由于列数和
    事实上你省略了他们的
    名字,我怀疑
    非标准化设计正在
    考虑过。如果新列是
    类似于 WebPage1
    WebPage2WebPage3等,然后
    这些需要分成
    单独的,标准化表。

但是,如果这些列确实是唯一的项目,彼此无关并且与 CustomerID (或该表的主键是什么)具有 1 对 1 的关系,并且大小限制不能被破坏了,那么把所有东西都放在一张桌子上就完全没问题了。

The answer is in details that were omitted from the question. The number of columns is irrelevant, it is the nature of the data that matters.

  • First, remember that a given row in
    any table can never exceed 8060
    bytes
    . So if the new columns are
    sized such that that limit can
    theoretically be exceeded, you will
    have built a time-bomb into the
    database. Sometime when it is least
    convenient, a data insert or update
    will throw an error and/or data will
    be lost.

    To guard against this, you may need
    to use more than one table, it's just
    a limitation of most editions of
    SQL-Server.
    .

  • The other important consideration is
    data-modeling. Do the new columns
    have a one-to-one relationship with
    CustomerID? For example, say
    eyeColor?

    Because of the number of columns and
    the fact that you omitted their
    names, I suspect that a
    non-normalized design is being
    contemplated. If the new columns are
    something like WebPage1,
    WebPage2, WebPage3, etc., then
    these need to be split into a
    separate, normalized table.
    .

But, if the columns really are unique items, unrelated to each other and with a 1-to-1 relationship to CustomerID (or whatever the primary-key of that table is), and the size limit cannot be busted, then having everything in one table is perfectly fine.

眼睛会笑 2024-09-21 06:12:31

为了数据模型的简单性,如果没有进一步的信息,我可能不会分区,但是您没有指出这些新列中数据的性质(也许某些列是应该标准化的数组)。

但是,有几点:

如果您进行垂直分区,并且对补充表有 FK 约束,这可能有助于消除某些情况下的联接,因为它知道将存在且只有一行。显然,它将在相同的唯一键上建立索引,这将有助于消除确定是否存在交叉联接的需要,因为只能有 0 或 1 行。

您可以拥有一个连接两个表的可更新视图,并在视图上有一个触发器,该触发器插入到连接的两个表中以形成视图。您还可以决定执行左连接,并且仅在需要补充行的任何列不为 NULL 时才创建补充行。

您还可以使用一组稀疏连接的补充数据表。显然,这也需要联接,但您也可以对多个补充表使用类似的技术,就像对 1 一样。

For simplicity of data model, without further information, I would probably not partition, but you haven't indicated the nature of the data in these new columns (perhaps some columns are arrays which should be normalized instead).

However, some points:

If you do vertically partition, and have a FK constraint on the supplemental table, that may help eliminate the join in some scenarios, since it knows that one and only one row will exist. Obviously it will be indexed on the same unique keys, which will help to eliminate the need to determine if there is a cross-join, since there can only be 0 or 1 rows.

You can have a single updatable view which joins the two tables and have a trigger on the view which inserts into the two tables joined to make the view. You could also decide to do a left join and only create a supplemental row at all if any of the columns needing it are non-NULL.

You can also use a sparsely joined set of tables of supplemental data. Obviously this would also need joins, but you could also use similar techniques with multiple supplemental tables as you would with 1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文