SQL 数据库:使用一张包含 3,5M 条目的表还是使用多个条目较少的表?
我正在开发一款足球游戏,让人们可以在线创建联赛。
假设一段时间后我有 1000 联赛。我将数据存储在 SQL 中。我应该:
1) 创建 1 个包含“名册”表的单一数据库,并将所有联赛的所有球员插入其中(1000 个联赛大约有 3.5 到 4 百万个条目)
2) 创建 1 个包含 20 个表的单一数据库“rosters1、rosters2 等”,并将联赛名册拆分为这 20 个表
3) 为每个联赛创建 1 个数据库,每个数据库中有一个表“rosters”
4) 创建 20 个每个数据库中都有一张“名册”表,并将联赛名册分为 20 个数据库。
对于性能和快速 SQL 查询而言,最佳选择是什么?
I'm building a football game that allows people to create leagues online.
Say I have 1000 league after some time. I store my data in SQL. Should I:
1) Create 1 single database with the table "rosters" and insert all the players of all the leagues into it (which would approximate 3,5 to 4 million entries for 1000 leagues)
2) Create 1 single database with 20 tables "rosters1, rosters2, etc" and split the leagues rosters between those 20 tables
3) Create 1 database for each league with a table "rosters" in each
4) Create 20 databases with one table "rosters" in each and split the leagues rosters between the 20 databases.
What is the best option for performance and quick SQL queries?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来您需要首先学习一些数据库基础知识,所以请花一些时间确保您理解 数据库规范化和索引。
然而,原则上,您可能需要一个“玩家”表(玩家 id、姓名等)、一个“名册”表(名册 id、姓名,可能是owner_id)以及链接两者的player_roster_map(player_id、roster_id)。您可能需要对player_id 和roster_id 的复合值进行唯一约束,并且您应该对该映射表中的player_id 和roster_id 都有外键约束。
(我假设每个名单上的球员并不是真正独特的;如果存在被拥有者覆盖的属性,您仍然可以进行类似的建模,但我是根据我对模型描述的理解进行猜测的。)
默认情况下,您的数据库引擎可能会索引您标记为主键的任何内容,并且可能会索引您标记为外键的任何内容,但这取决于数据库,而且我并不是 MySql 方面的专家;你需要做一些研究。
适当的索引将在执行查询时为您提供帮助,并且在插入性能方面的成本通常很小。如果您最终得到大量相同的值或查询的利用率很高,其中只有一小部分值(例如时间范围),您最终可能还需要了解分区。分区使您能够获得处理相当大量记录的好处,而不会出现非规范化会引入的建模问题。然而,不要过早采取这一步;您可能会发现正确的数据库设计足以满足您所暗示的数据集的大小。
It sounds like you need to learn a few database basics, first, so please spend some time making sure you understand Database Normalization and indexes.
However, in principal, you probably need a "players" table (player id, name, etc), a "roster" table (roster id, name, maybe owner_id), and a player_roster_map (player_id, roster_id) that links the two. You'll probably want a unique constraint on the composite value of player_id and roster_id, and you should have a foreign key constraint on both player_id and roster_id in that mapping table.
(I'm presuming that players are not really unique per roster; you can still model similarly if there are attributes that are overridden by the person that owns them, but I'm guessing based on my understanding of your description of the model.)
Your database engine will probably index anything that you mark as a primary key by default, and may index anything that you mark as a foreign key, but this is database dependent, and I am not particularly an expert on MySql; you'll need to do some research.
Proper indexes will help you when you execute queries, at a generally small cost in terms of insert performance. If you end up with large numbers of identical values or a high utilization of queries in which only a small range of values (such as time ranges), you may eventually need to understand partitioning, as well. Partitioning allows you to get the benefits of dealing with fairly large number of records without the modeling problems that denormalization would introduce. However, don't prematurely take this step; you'll probably find that a correct database design will be enough with the size of dataset that you're implying.
我认为原则上你想要选项 1。您必须考虑用例,例如将某人从一个名册转移到另一个名册:您不希望跨数据库发生这种情况。我想象您想要对所有花名册运行许多查询。您可能会发现研究分区很有帮助,即使 MySQL 不提供它。 (Postgresql 就是这样。)
您还需要考虑规范化。示例:我假设您有一个
teams
表,而 PlayerTeam 只是其中的外键(顺便说一句,您应该在两个表中索引此列)。如果这对您来说没有意义,请花一点时间阅读数据库设计书籍或网站。I think you want option 1 in principle. You have to consider use cases like moving someone from one roster to another: you don't want that to occur across databases. And I imagine many queries that you want to run against all rosters. You may find it helpful to research partitioning, even though MySQL does not offer it. (Postgresql does.)
You also need to look at normalization. Example: I assume you have a
teams
table, and PlayerTeam is just a Foreign Key into it (you should index this column in both tables BTW). If this doesn't make sense to you, spend a little time on a database design book or web site.