SQL设计问题
我有一个包含 25 列的表,其中 20 列的某些 (30-40%) 行可以具有空值。 现在,具有 20 个空列的行的成本是多少?这样可以吗?
或者
使用另一个表来存储这 20 列并向第一个表添加引用是一个好的设计吗? 这样,只有当有值时,我才会写入第二个表。
我正在使用 SQL Server 2005。将来会迁移到 2008。
只有 20 列是 varchar、其余的smallint、smalldate
我存储的内容: 这些列存储其所属行的不同属性。这些属性有时可能为空。
该表将容纳约十亿行,
请发表评论。
I have a table with 25 columns where 20 columns can have null values for some (30-40%) rows.
Now what is the cost of having rows with 20 null columns? Is this OK?
Or
is it a good design to have another table to store those 20 columns and add a ref to the first table?
This way I will only write to the second table only when there is are values.
I am using SQL server 2005. Will migrate to 2008 in future.
Only 20 columns are varchar, rest smallint, smalldate
What I am storing:
These columns store different attributes of the row it belongs to. These attributes can be null sometimes.
The table will hold ~billion of rows
Please comment.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您应该描述您正在存储的数据类型。听起来其中一些列应该移动到另一个表。
例如,如果您有多个列代表同一类型数据的多个列,那么我会说将其移动到另一个表。另一方面,如果您需要这么多列来描述不同类型的数据,那么您可能需要保持原样。
所以这在某种程度上取决于你正在建模的内容。
You should describe the type of data you are storing. It sounds like some of those columns should be moved to another table.
For example, if you have several columns that represent multiple columns for the same type of data, then I would say move it to another table On the other hand, if you need this many columns to describe different types of data, then you may need to keep it as it is.
So it kind of depends on what you are modelling.
在某些情况下是否需要其中一些列?如果是这样,那么也许您应该使用某种形式的继承。例如,如果这是有关医院患者的信息,并且有一些数据仅对女性患者有意义,那么您可以使用这些列创建 FemalePatents 表。然后,可以在该单独的表中将那些必须始终为女性患者收集的列声明为“NOT NULL”。
Are there some circumstances where some of those columns are required? If so, then perhaps you should use some form of inheritance. For instance, if this were information about patients in a hospital, and there was some data that only made sense for female patients, then you could create a FemalePatients table with those columns. Those columns that must always be collected for female patients could then be declared
NOT NULL
in that separate table.这取决于数据类型(40 个可空整数基本上与 40 个不可空整数占用相同的空间,无论值如何)。在 SQL Server 中,使用普通技术,空间相当有效。 2008 年,您确实拥有了 SPARSE 功能。
如果您确实使用可选的 1:1 关系垂直拆分表,则可以使用视图包装两个表,并在视图上添加触发器以使其可更新并隐藏底层实现。
因此有很多选择,其中许多可以在看到数据加载和行为后实施。
It depends on the data types (40 nullable ints is going to basically take the same space as 40 non-nullable ints, regardless of the values). In SQL Server, the space is fairly efficient with ordinary techniques. In 2008, you do have the SPARSE feature.
If you do split the table vertically with an optional 1:1 relationship, there is a possibility of wrapping the two tables with a view and adding triggers on the view to make it updatable and hide the underlying implementation.
So there are plenty of options, many of which can be implemented after you see the data load and behavior.
根据您拥有的不同属性集创建表格。因此,如果您有一些数据,其中某些列不适用,那么将该数据放在没有这些列的表中是有意义的。尽可能避免在多个表中重复相同的属性。确保您的数据至少采用 Boyce-Codd / 第五范式,这样您就不会出错。
Create tables based on the distinct sets of attributes you have. So if you have some data where some of your columns do not apply then it would make sense to have that data in a table which doesn't have those columns. As far as possible, avoid repeating the same attribute in multiple tables. Make sure your data is in at least Boyce-Codd / 5th Normal Form and you won't go far wrong.