数据库表越多越好吗?

发布于 2024-08-13 17:52:45 字数 543 浏览 3 评论 0原文

最近,我一直在重新思考几个月前所做的数据库设计。主要原因是昨晚我读了vBulletin的数据库模式,发现他们使用了很多很多表。

我当前用于模式(例如我的日志表)的“想法”是通过将日志类型与整数进行区分,将所有内容保留在一个表中:

id, type, type_id, action, message
1 ,    1,     305,      2, 'Explanation for user Ban'
2,     2,    1045,      1, 'Reason for deletion of Article'

其中 type 1 = user,type 2 =article< /code>,type_id = 用户、文章或 w/e 的 IDaction 2 = 禁止,action 1 = 删除

我是否应该将设计更改为两个表 logBanslogSomething 等?还是保留我当前使用的方法更好?

Lately I've been rethinking a database design I made a couple of months ago. The main reason is that last night I read the databse schema of vBulletin and saw that they use many, MANY, tables.

The current "idea" I'm using for my schema, for instance my log table, is to keep everything in one table by differencing the type of Log with an integer:

id, type, type_id, action, message
1 ,    1,     305,      2, 'Explanation for user Ban'
2,     2,    1045,      1, 'Reason for deletion of Article'

Where type 1 = user, type 2 = article, type_id = the ID of the user, article or w/e and action 2 = ban, action 1 = deletion.

Should I change the design to two tables logBans, logSomething and so on? or is it better to keep the method I'm currently using?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

毁梦 2024-08-20 17:52:45

这里的问题是子类型。处理子类型有三种基本方法。

  1. 将每个记录类型放入完全独立的表中;
  2. 将一条记录放入父表中,然后将一条记录放入子类型表中;并将
  3. 所有记录放入一个表中,并为“可选”数据(即不适用于该类型的数据)提供可为空的列。

每种策略都有其优点。

例如,如果不同亚型之间几乎没有差异,则(3)特别适用。在您的情况下,如果不同的日志记录属于特定类型,它们是否具有额外的列?如果他们不这样做,或者很少有情况他们这样做,那么将它们全部放在一张表中是非常有意义的。

(2) 常用于派对表。这是 CRM 中的常见模型,涉及父方对象,该对象具有个人和组织的子类型(组织也可能具有公司、协会等子类型)。个人和组织具有不同的属性(例如,个人的称呼、名字、出生日期等),因此将其分开而不是使用可为空的列是有意义的。

(2) 可能更节省空间(尽管现代 DBMS 中 NULL 列的开销非常低)。更大的问题是(2)可能会让开发人员更加困惑。您会遇到这样一种情况,有人需要在某处存储额外的字段,并将其放入该类型为空的列中,仅仅是因为这样做比获得 DBA 批准添加列更容易(不,我不是在开玩笑)。

根据我的经验,(1) 可能是这 3 个方案中最不常用的。

最后,必须考虑可扩展性,这可能是 (1) 的最佳情况。在某些时候,JOIN 无法有效扩展,您需要使用某种分区方案来减小表大小。 (1) 是一种方法(但是是一种粗略的方法)。

不过我不会对此太担心。在这成为问题之前,您通常需要获取数亿或数十亿条记录(除非您的记录非常大,在这种情况下它会更快发生)。

The issue here is subtyping. There are three basic approaches to dealing with subtypes.

  1. Put each record type into a completely separate table;
  2. Put a record in a parent table and then a record in a subtype table; and
  3. Put all the records in one table, having nullable columns for the "optional" data (ie things that don't apply to that type).

Each strategy has its merits.

For example, (3) is particularly applicable if there is little to no difference between different subtypes. In your case, do different log records have extra columns if they're of a particular type? If they don't or there are few cases when they do putting them all in one table makes perfect sense.

(2) is common used for a Party table. This is a common model in CRMs that involves a parent Party object which has subtypes for Person and Organization (Organization may also have subtypes like Company, Association, etc). Person and Organization have different properties (eg salutation, given names, date of birth, etc for Person) so it makes sense to split this up rather than using nullable columns.

(2) is potentially more space efficient (although the overhead of NULL columns in modern DBMSs is very low). The bigger issue is that (2) might be more confusing to developers. You will get a situation where someone needs to store an extra field somewhere and will whack it in a column that's empty for that type simply because it's easier doing that than getting approval for the DBAs to add a column (no, I'm not kidding).

(1) is probably the least frequently used scheme of the 3 in my experience.

Lastly, scalability has to be considered and is probably the best case for (1). At a certain points JOINs don't scale effectively and you'll need to use some kind of partitioning scheme to cut down your table sizes. (1) is one method of doing that (but a crude method).

I wouldn't worry too much about that though. You'll typically need to get to hundreds of millions or billions of records before that becomes an issue (unless your records are really really large, in which case it'll happen sooner).

巡山小妖精 2024-08-20 17:52:45

这取决于。如果您将拥有 1500000000 个类型 1 的条目和 1000 个类型 2 的条目,并且您将对类型 2 进行大量查询,请将表分开。如果没有的话,只保留一张桌子会更方便。

请记住可扩展性:

  • 一年内我将拥有每种类型多少个条目?

  • 我将在此表上执行多少个请求?

  • 您可以在某个时候清除此日志吗?您可以将其移动到另一个表(例如早于 X 个月的存档条目)吗?

It depends. If you're going to have 1500000000 entries of type 1 and 1000 entries of type 2 and you'll be doing a LOT of queries on type 2, separate the tables. If not, it's more convenient to keep only one table.

Keep in mind scalability:

  • How many entries of each type will I have in 1 year?

  • How many requests on this table will I be doing ?

  • Can you, at some point, clear this log? Can you move it to another table (like archive entries older than X months) ?

╄→承喏 2024-08-20 17:52:45

我现在看到的一个缺点是,您无法在 type_id 上强制执行外键完整性,因为它指向许多不同的表。

The one drawback I see right now is that you cannot enforce foreign key integrity on your type_id since it points to many different tables.

木有鱼丸 2024-08-20 17:52:45

我想添加一个小提示。有点偏离主题,而且很基础,但是使用 enum 而不是 tinyint 作为状态标志会更清晰,即

   enum('user','type')

如果只有两个状态,则 tinyint 的内存效率更高一些,但不太清晰。枚举的另一个缺点是您将部分业务逻辑放在数据层中 - 当您需要添加或删除状态时,您必须更改数据库。否则它会更清晰,我更喜欢enum

I want to add a small tip. A little off topic, and quite basic, but it's a lot clearer to use enum instead of tinyint for status flags, i.e.

   enum('user','type')

If there are only two statuses, tinyint is a little more memory efficient, but less clear. Another disadvantage in enum is that you put a part of the business logic in the data tier - when you need to add or remove statuses, you have to alter the DB. Otherwise it's much more clear and I prefer enum.

你穿错了嫁妆 2024-08-20 17:52:45

我会让事情尽可能具体 - 在这种情况下我会创建两个表。
每个表都有特定的用途,所以我不明白为什么你要把它们组合起来。

I would keep things as specific as possible - in this case I would create two tables.
Each table has a specific purpose so I cannot see why you would combine them.

千年*琉璃梦 2024-08-20 17:52:45

我不会做 vBulletin 所做的事情。像 vBulletin 这样的旧应用程序的问题在于,虽然它们一开始可能是精益机器,但随着时间的推移,它们会收集大量熵并最终变得臃肿。由于存在插件、第三方工具以及使用旧代码的开发人员,因此破坏它是一个艰难的选择。

这就是为什么这里没有进行太多重构的原因。不要让它们成为你的编程模型。环顾四周,找出最有效的方法并加以利用。很多桌子对我来说听起来是一件坏事,而不是好事。

I wouldn't do what vBulletin does. The problem with older apps like vBulletin is that while they might have started as lean-machines, over the time they collect a lot of entropy and end up being bloated. Since there are plugins, and third-party tools, and developers who've worked on the old code, breaking it is a tough choice.

That's why there is not much refactoring going on here. Don't make them your programming model. Look around, find out what works best and use that. A lot of table sounds like a bad thing to me, not good.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文