数据库表越多越好吗？

发布于 2024-08-13 17:52:45 字数 543 浏览 3 评论 0原文

最近，我一直在重新思考几个月前所做的数据库设计。主要原因是昨晚我读了vBulletin的数据库模式，发现他们使用了很多很多表。

我当前用于模式（例如我的日志表）的“想法”是通过将日志类型与整数进行区分，将所有内容保留在一个表中：

id, type, type_id, action, message
1 ,    1,     305,      2, 'Explanation for user Ban'
2,     2,    1045,      1, 'Reason for deletion of Article'

其中 type 1 = user，type 2 =article< /code>，type_id = 用户、文章或 w/e 的 ID 和 action 2 = 禁止，action 1 = 删除。

我是否应该将设计更改为两个表 logBans、logSomething 等？还是保留我当前使用的方法更好？

原文

Lately I've been rethinking a database design I made a couple of months ago. The main reason is that last night I read the databse schema of vBulletin and saw that they use many, MANY, tables.

The current "idea" I'm using for my schema, for instance my log table, is to keep everything in one table by differencing the type of Log with an integer:

id, type, type_id, action, message
1 ,    1,     305,      2, 'Explanation for user Ban'
2,     2,    1045,      1, 'Reason for deletion of Article'

Where type 1 = user, type 2 = article, type_id = the ID of the user, article or w/e and action 2 = ban, action 1 = deletion.

Should I change the design to two tables logBans, logSomething and so on? or is it better to keep the method I'm currently using?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

毁梦 2024-08-20 17:52:45

这里的问题是子类型。处理子类型有三种基本方法。

将每个记录类型放入完全独立的表中；
将一条记录放入父表中，然后将一条记录放入子类型表中；并将
所有记录放入一个表中，并为“可选”数据（即不适用于该类型的数据）提供可为空的列。

每种策略都有其优点。

例如，如果不同亚型之间几乎没有差异，则（3）特别适用。在您的情况下，如果不同的日志记录属于特定类型，它们是否具有额外的列？如果他们不这样做，或者很少有情况他们这样做，那么将它们全部放在一张表中是非常有意义的。

(2) 常用于派对表。这是 CRM 中的常见模型，涉及父方对象，该对象具有个人和组织的子类型（组织也可能具有公司、协会等子类型）。个人和组织具有不同的属性（例如，个人的称呼、名字、出生日期等），因此将其分开而不是使用可为空的列是有意义的。

(2) 可能更节省空间（尽管现代 DBMS 中 NULL 列的开销非常低）。更大的问题是（2）可能会让开发人员更加困惑。您会遇到这样一种情况，有人需要在某处存储额外的字段，并将其放入该类型为空的列中，仅仅是因为这样做比获得 DBA 批准添加列更容易（不，我不是在开玩笑）。

根据我的经验，(1) 可能是这 3 个方案中最不常用的。

最后，必须考虑可扩展性，这可能是 (1) 的最佳情况。在某些时候，JOIN 无法有效扩展，您需要使用某种分区方案来减小表大小。 (1) 是一种方法（但是是一种粗略的方法）。

不过我不会对此太担心。在这成为问题之前，您通常需要获取数亿或数十亿条记录（除非您的记录非常大，在这种情况下它会更快发生）。

The issue here is subtyping. There are three basic approaches to dealing with subtypes.

Put each record type into a completely separate table;
Put a record in a parent table and then a record in a subtype table; and
Put all the records in one table, having nullable columns for the "optional" data (ie things that don't apply to that type).

Each strategy has its merits.

For example, (3) is particularly applicable if there is little to no difference between different subtypes. In your case, do different log records have extra columns if they're of a particular type? If they don't or there are few cases when they do putting them all in one table makes perfect sense.

(2) is common used for a Party table. This is a common model in CRMs that involves a parent Party object which has subtypes for Person and Organization (Organization may also have subtypes like Company, Association, etc). Person and Organization have different properties (eg salutation, given names, date of birth, etc for Person) so it makes sense to split this up rather than using nullable columns.

(2) is potentially more space efficient (although the overhead of NULL columns in modern DBMSs is very low). The bigger issue is that (2) might be more confusing to developers. You will get a situation where someone needs to store an extra field somewhere and will whack it in a column that's empty for that type simply because it's easier doing that than getting approval for the DBAs to add a column (no, I'm not kidding).

(1) is probably the least frequently used scheme of the 3 in my experience.

Lastly, scalability has to be considered and is probably the best case for (1). At a certain points JOINs don't scale effectively and you'll need to use some kind of partitioning scheme to cut down your table sizes. (1) is one method of doing that (but a crude method).

I wouldn't worry too much about that though. You'll typically need to get to hundreds of millions or billions of records before that becomes an issue (unless your records are really really large, in which case it'll happen sooner).

回复收藏 0 原文

巡山小妖精 2024-08-20 17:52:45

这取决于。如果您将拥有 1500000000 个类型 1 的条目和 1000 个类型 2 的条目，并且您将对类型 2 进行大量查询，请将表分开。如果没有的话，只保留一张桌子会更方便。

请记住可扩展性：

一年内我将拥有每种类型多少个条目？
我将在此表上执行多少个请求？
您可以在某个时候清除此日志吗？您可以将其移动到另一个表（例如早于 X 个月的存档条目）吗？

回复收藏 0 原文

╄→承喏 2024-08-20 17:52:45

我现在看到的一个缺点是，您无法在 type_id 上强制执行外键完整性，因为它指向许多不同的表。

回复收藏 0 原文

木有鱼丸 2024-08-20 17:52:45

我想添加一个小提示。有点偏离主题，而且很基础，但是使用 enum 而不是 tinyint 作为状态标志会更清晰，即

   enum('user','type')

如果只有两个状态，则 tinyint 的内存效率更高一些，但不太清晰。枚举的另一个缺点是您将部分业务逻辑放在数据层中 - 当您需要添加或删除状态时，您必须更改数据库。否则它会更清晰，我更喜欢enum。

I want to add a small tip. A little off topic, and quite basic, but it's a lot clearer to use enum instead of tinyint for status flags, i.e.

   enum('user','type')

If there are only two statuses, tinyint is a little more memory efficient, but less clear. Another disadvantage in enum is that you put a part of the business logic in the data tier - when you need to add or remove statuses, you have to alter the DB. Otherwise it's much more clear and I prefer enum.

回复收藏 0 原文