数据库效率 - 每个用户的表与用户表
对于有用户的网站。每个用户都有能力创建任意数量的帖子,我们将其称为“帖子”:
从效率角度来看,是否最好为所有帖子创建一个表,保存创建帖子的用户的用户 ID,为每个帖子 - 或者为每个用户创建一个不同的单独的表并只放该用户创建的帖子?
For a website having users. Each user having the ability to create any amount of, we'll call it "posts":
Efficiency-wise - is it better to create one table for all of the posts, saving the user-id of the user which created the post, for each post - OR creating a different separate table for each user and putting there just the posts created by that user?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
当您向数据库添加更多数据时,数据库布局不应改变,因此用户数据绝对应该位于一张表中。
另外:
拥有多个表意味着您必须动态创建查询。
一个表的缓存查询计划不会用于任何其他表。
在一张表中包含大量数据不会对性能产生太大影响,但在一张表中包含大量数据会
如果你想给表添加索引让查询更快,那么在单个表上就容易多了。
The database layout should not change when you add more data to it, so the user data should definitely be in one table.
Also:
Having multiple tables means that you have to create queries dynamically.
The cached query plan for one table won't be used for any other of the tables.
Having a lot of data in one table doesn't affect performance much, but having a lot of tables does.
If you want to add an index to the table to make queries faster, it's a lot easier to do on a single table.
好回答具体问题:就查询效率而言,拥有小表总是更好,因此每个用户一个表可能是最有效的。
但是,除非您有大量帖子和用户,否则这可能并不重要。即使有数百万行,您也可以通过适当放置的索引获得良好的性能。
我强烈建议不要使用每用户表策略,因为它会给您的解决方案增加很多复杂性。例如,当您需要查找一年内在某个主题上发帖的用户时,您将如何查询?
在需要时进行优化。不是因为您认为/担心事情会变慢。 (即使你需要优化,也会有比 table-per-user 更简单的选择)
Well to answer the specific question: In terms of efficiency of querying, it will always be better to have small tables, hence a table per user is likely to be the most efficient.
However, unless you have a lot of posts and users, this is not likely to matter. Even with millions of rows, you will get good performance with a well-placed index.
I would strongly advise against the table-per-user strategy, because it adds a lot of complexity to your solution. How would you query when you need to find, say, users that have posted on a subject within the year ?
Optimize when you need to. Not because you think/are afraid something will be slow. (And even if you need to optimize, there will be easier options than table-per-user)
具有不同数量的表的模式通常是不好的。为您的帖子使用一张表。
Schemas with a varying number of tables are, generally, bad. Use one single table for your posts.
如果性能是一个问题,您应该了解数据库索引。虽然索引不是 SQL 标准的一部分,但几乎所有数据库都支持它们以帮助提高性能。
我建议您为所有用户的帖子创建一个表,然后向该表添加索引以提高搜索性能。例如,您可以在
user
列上添加索引,以便可以快速查找给定用户的所有帖子。您可能还需要考虑添加其他索引,具体取决于应用程序的要求。If performance is a concern, you should learn about database indexes. While indexes is not part of the SQL standard, nearly all databases support them to help improve performance.
I recommend that you create a single table for all users' posts and then add an indexes to this table to improve the performance of searching. For example you can add an index on the
user
column so that you can quickly find all posts for a given user. You may also want to consider adding other indexes, depending on your application's requirements.您的第一个建议是拥有一个
用户
和一个post
表,这是要采取的标准方法。目前,帖子可能是您网站上唯一的特定于用户的功能,但想象一下,它可能需要在未来增长以支持具有消息、偏好等的用户。现在,您的每个用户单独的表方法会导致爆炸式增长您需要创建的表的数量。
Your first proposal of having a single
user
and a singlepost
table is the standard approach to take.At the moment posts may be the only user-specific feature on your site, but imagine that it might need to grow in the future to support users having messages, preferences, etc. Now your separate table-per-user approach leads to an explosion in the number of tables you'd need to create.
我对你的答案有一个类似但不同的问题,因为 @guffa 和 @driis 都假设“帖子”需要在用户之间共享。
在我的特殊情况下:出于隐私原因,不能与任何其他用户共享单个用户数据点,甚至不能用于分析。
我们计划使用 mysql 或 postgres,以下是我们团队争论的三个选项:
N 模式和 5 个表 - 我们的一些开发人员认为这是完全保留数据的最佳方向隔离。
优点 - 如果您将架构视为文件夹并将表视为文件,则复杂性会降低。我们将为每个用户提供一个模式
缺点 - 大多数 ORM 按架构 1 架构和 nx5 表 进行连接池
- 一些开发人员喜欢这样做,因为它允许连接池,但似乎使问题变得更加复杂。
优点 - ORM 中的连接池是可能的
缺点 - 找不到为此 1 个架构和 5 个表 设置模型的 ORM
- 一些开发人员喜欢这样,因为他们认为我们从缓存中受益。
优点:ORM 很高兴,因为这就是他们的设计目的
缺点:每个查询都需要用户名表,就
我个人而言,属于阵营 1: n 模式。
我的首席开发人员进入了营地 3:1 架构 5 个表。
缓存:
如果数据始终是 1:1,那么无论我们使用什么解决方案,我都看不到缓存有何帮助,因为每个用户都会搜索不同的信息。
有什么想法吗?
I have a similar but different issue with your answer because both @guffa and @driis are assuming that the "posts" need to be shared among users.
In my particular situation: not a single user datapoint can be shared for privacy reason with any other user not even for analytics.
We plan on using mysql or postgres and here are the three options our team is warring about:
N schema and 5 tables - some of our devs feel that this is the best direction to make to keep the data completely segregated.
Pros - less complexity if you think of schema as a folder and tables as files. We'll have one schema per user
Cons - most ORMs do connection pooling per schema
1 schema and nx5 tables - some devs like this because it allows for connection pooling but appears to make the issue more complex.
Pros - connection pooling in the ORM is possible
Cons - cannot find an ORM where Models are set up for this
1 schema and 5 tables - some devs like this because they think we benefit from caching.
Pros: ORMs are happy because this is what they are designed to do
Cons: every query requires the username table
I, personally, land in camp 1: n schemas.
My lead dev lands in camp 3: 1 schema 5 tables.
Caching:
If data is always 1:1, I cannot see how caching will ever help regardless of the solution we use because each user will be searching for different info.
Any thoughts?