The design of the table depends on the entity it needs to store. If all the data belongs together, then 50 columns (or even 100) might be the correct thing to do.
So long as the table is normalized, there is no rule of thumb regarding size, apart from database capabilities and the need to optimize.
I agree with Oded. I have seen tables with 500 columns in them, and all the columns in them were in the correct place. Just consider the number of facts one might wish to store about an everyday object, and you'll soon see why.
If it proves inconvenient to select all those columns, or to specify which columns to select when you are only interested in a small proportion of them, you may find it worthwhile to define a view.
Having too many columns results in a lot nulls (evil) and an unwieldy object the table is mapped to. This hurts readability in the IDE and hinders maintenance (increasing development costs). If you need fast reads in some cases use denormalized tables e.g. used solely for reporting or queries (search for the "CQRS" pattern). Yes "Person" has a million attributes, but you can break down these monothilic tables (design preceeds normalization) to match smaller entities ("address," "phone," "hobby") instead of adding new columns for each new use case. Having smaller sized objects (and tables) brings so many advantages; they enable things like unit testing, OOP, and SOLID practices.
Also, as it regards to bunching numerous columns to avoid joins, I think the performance gain from avoiding joins is lost through index maintenance, assuming a typical workload of both reads and writes. Adding indexes on fields for sake of read performance could be indicative of a need to move those fields into their own table.
odbc has a character limit of 8000 .... so that is a physical limit beyond which things get highly frustrating.
I worked on a table that had 138 columns .. it was horribly written and could have been normalised. Although this database seem to of been the creation of someone wondering why there are conventions in database design and deciding to test them all at once.
Having very wide flattened tables is fairly common when you get into data warehousing and reporting servers. They are just a lot faster and mean that you don't have to store your database entirley in ram for performance.
It also highly depends on the usecase for your table. If you want to optimize it for reading then it might be a good idea to keep it all together in one table.
In the NO-SQL world (cassandra/hbase for example) there are no constraints on the number of columns and it's actually considered a good practice to have many columns. This also comes from the way it is stored (no gaps). Worth while investigating.
According to my experience it is better to have less joins as those tend to happen too often especially in big database. As long as your database tables are designed to store single entity (student, teacher and so on) this should be ok. So that this will be represented as an object in you code later. So, if you split the entity to several tables you will have to use several joins in order to fill your object later. Also if you use ORM to generate your data access layer (such as Linq in .Net) is will generate separate classes for each table (of course with an relationship between them but still) and this will be harder to use.
Another thing is that you can specify which columns to return in your query and this will reduce the data that is passed to your application, but if you need even a single column from another table you will have to do the join. And in most cases as you have so many columns, then the probability to have large amount of data stored in the db is high. So this join would harm more, than the NULLs.
Every project I have worked on is different so you should find the balance for each story.
Which is more of a performance hit: lots of columns with lots of NULLs, or fewer columns with lots of JOINs?
It is purely depends on data you store, indexes you make and so on. No one can ensure you that one works better than another without knowing what are you storing. Generally normalization rules will "force" you separate data to different tables and user FKeys if you have large table but i disagree that it ALWAYS performs better than one big table. You can end with 6-7 level joins in dozens of queries that sometimes will cause errors because there much more chances to create an error in larger queries that in simple ones.
If you post some requirements of what you are doing maybe we can help you with designing the DB properly.
It's better to use a single table by where you can avoid using joins while querying it depends on whether the columns are of same entity or different entity.
For example, assume you are doing a database design for work flow where some fields will be edited by junior workers, and some fields by senior workers. In this case it is better to have all the columns in a single table.
发布评论
评论(9)
表的设计取决于它需要存储的实体。如果所有数据都属于一起,则 50 列(甚至 100 列)可能是正确的做法。
只要表格是标准化,除了数据库功能之外,就没有关于大小的经验法则以及优化的需要。
The design of the table depends on the entity it needs to store. If all the data belongs together, then 50 columns (or even 100) might be the correct thing to do.
So long as the table is normalized, there is no rule of thumb regarding size, apart from database capabilities and the need to optimize.
我同意奥德的观点。我见过有 500 列的表,并且其中的所有列都位于正确的位置。只要考虑一下人们可能希望存储的关于一件日常物品的事实数量,您很快就会明白原因。
如果事实证明选择所有这些列不方便,或者当您只对其中一小部分感兴趣时指定要选择的列,您可能会发现定义视图是值得的。
I agree with Oded. I have seen tables with 500 columns in them, and all the columns in them were in the correct place. Just consider the number of facts one might wish to store about an everyday object, and you'll soon see why.
If it proves inconvenient to select all those columns, or to specify which columns to select when you are only interested in a small proportion of them, you may find it worthwhile to define a view.
当您觉得添加另一列不再有意义或正确时。
一般取决于应用。
When you feel it no longer makes sense or is right to add another column.
Generally depends on application.
列太多会导致大量空值(邪恶)以及表映射到的笨重对象。这会损害 IDE 的可读性并阻碍维护(增加开发成本)。如果您在某些情况下需要快速读取,请使用非规范化表,例如仅用于报告或查询(搜索“CQRS”模式)。是的,“人”有一百万个属性,但您可以分解这些整体表(设计先于标准化)以匹配较小的实体(“地址”、“电话”、“爱好”),而不是为每个新用例添加新列。拥有较小尺寸的对象(和桌子)会带来很多优势;它们支持单元测试、OOP 和 SOLID 实践等功能。
另外,就聚集大量列以避免连接而言,我认为,假设读取和写入的典型工作负载,避免连接带来的性能增益会因索引维护而丢失。为了读取性能而在字段上添加索引可能表明需要将这些字段移动到它们自己的表中。
Having too many columns results in a lot nulls (evil) and an unwieldy object the table is mapped to. This hurts readability in the IDE and hinders maintenance (increasing development costs). If you need fast reads in some cases use denormalized tables e.g. used solely for reporting or queries (search for the "CQRS" pattern). Yes "Person" has a million attributes, but you can break down these monothilic tables (design preceeds normalization) to match smaller entities ("address," "phone," "hobby") instead of adding new columns for each new use case. Having smaller sized objects (and tables) brings so many advantages; they enable things like unit testing, OOP, and SOLID practices.
Also, as it regards to bunching numerous columns to avoid joins, I think the performance gain from avoiding joins is lost through index maintenance, assuming a typical workload of both reads and writes. Adding indexes on fields for sake of read performance could be indicative of a need to move those fields into their own table.
odbc 的字符限制为 8000 .... 因此,这是一个物理限制,超过该限制,事情就会变得非常令人沮丧。
我在一个有 138 列的表上工作。它写得很糟糕,本来可以标准化。尽管这个数据库似乎是有人想知道为什么数据库设计中有约定并决定立即测试它们的创建者。
当您进入数据仓库和报告服务器时,拥有非常宽的扁平表是相当常见的。它们的速度要快得多,并且意味着您不必为了性能而将数据库全部存储在 RAM 中。
odbc has a character limit of 8000 .... so that is a physical limit beyond which things get highly frustrating.
I worked on a table that had 138 columns .. it was horribly written and could have been normalised. Although this database seem to of been the creation of someone wondering why there are conventions in database design and deciding to test them all at once.
Having very wide flattened tables is fairly common when you get into data warehousing and reporting servers. They are just a lot faster and mean that you don't have to store your database entirley in ram for performance.
它还很大程度上取决于您的表的用例。如果您想优化它以方便阅读,那么将它们全部保存在一张表中可能是个好主意。
在 NO-SQL 世界(例如 cassandra/hbase)中,列的数量没有限制,实际上拥有许多列被认为是一个很好的做法。这也来自于它的存储方式(没有间隙)。值得我们去研究一下。
It also highly depends on the usecase for your table. If you want to optimize it for reading then it might be a good idea to keep it all together in one table.
In the NO-SQL world (cassandra/hbase for example) there are no constraints on the number of columns and it's actually considered a good practice to have many columns. This also comes from the way it is stored (no gaps). Worth while investigating.
根据我的经验,最好减少连接,因为这些连接往往发生得太频繁,尤其是在大型数据库中。只要您的数据库表被设计为存储单个实体(学生、教师等),这应该没问题。这样这将在稍后的代码中表示为一个对象。因此,如果将实体拆分为多个表,则必须使用多个联接才能稍后填充对象。此外,如果您使用 ORM 生成数据访问层(例如 .Net 中的 Linq),将为每个表生成单独的类(当然它们之间存在关系,但仍然如此),这将更难使用。
另一件事是,您可以指定在查询中返回哪些列,这将减少传递到应用程序的数据,但如果您甚至需要另一个表中的单个列,则必须进行联接。在大多数情况下,由于您有如此多的列,那么数据库中存储大量数据的可能性就很高。所以这个连接比 NULL 危害更大。
我从事的每个项目都是不同的,所以你应该找到每个故事的平衡。
According to my experience it is better to have less joins as those tend to happen too often especially in big database. As long as your database tables are designed to store single entity (student, teacher and so on) this should be ok. So that this will be represented as an object in you code later. So, if you split the entity to several tables you will have to use several joins in order to fill your object later. Also if you use ORM to generate your data access layer (such as Linq in .Net) is will generate separate classes for each table (of course with an relationship between them but still) and this will be harder to use.
Another thing is that you can specify which columns to return in your query and this will reduce the data that is passed to your application, but if you need even a single column from another table you will have to do the join. And in most cases as you have so many columns, then the probability to have large amount of data stored in the db is high. So this join would harm more, than the NULLs.
Every project I have worked on is different so you should find the balance for each story.
这纯粹取决于您存储的数据、您创建的索引等等。在不知道您存储的内容的情况下,没有人可以向您保证其中一个比另一个工作得更好。一般来说,如果您有大表,规范化规则将“强制”您将数据分离到不同的表和用户 FKey,但我不同意它总是比一个大表的性能更好。您可以在数十个查询中以 6-7 级联接结束,这有时会导致错误,因为在较大的查询中比在简单的查询中产生错误的机会要多得多。
如果您发布了您正在做的事情的一些要求,也许我们可以帮助您正确设计数据库。
It is purely depends on data you store, indexes you make and so on. No one can ensure you that one works better than another without knowing what are you storing. Generally normalization rules will "force" you separate data to different tables and user FKeys if you have large table but i disagree that it ALWAYS performs better than one big table. You can end with 6-7 level joins in dozens of queries that sometimes will cause errors because there much more chances to create an error in larger queries that in simple ones.
If you post some requirements of what you are doing maybe we can help you with designing the DB properly.
最好使用单个表,这样可以避免在查询时使用联接,具体取决于列是属于同一实体还是不同实体。
例如,假设您正在为工作流程进行数据库设计,其中某些字段将由初级工人编辑,某些字段将由高级工人编辑。在这种情况下,最好将所有列放在一个表中。
It's better to use a single table by where you can avoid using joins while querying it depends on whether the columns are of same entity or different entity.
For example, assume you are doing a database design for work flow where some fields will be edited by junior workers, and some fields by senior workers. In this case it is better to have all the columns in a single table.