与用户内容网站相关的数据库设计问题很少
设计一个用户内容网站(有点类似于 yelp,但针对不同的市场并且具有照片共享)并且有一些数据库问题:
每个用户是否都有自己的一组 表还是我们存储多个 用户数据放入公共表中?自从 这甚至是一个社交网络,当 用户规模因可扩展性而增长 数据库通常是分区的 离开。不同的用户组是 分开发送,所以什么是最好的 方法?我猜一些数据像 用户帐户可以是共同的 桌子,但墙柱、照片等 每个用户都会得到自己的表? 如果是这样,那么如果我们有 1000 万 用户则意味着 1000 万 x 每个用户有多少张表? 目前正在设计中 MySQL
用户表如何知道什么 每次用户加入时创建 地点?我假设可能有一个 系统表模板 正在田里拉?
除了上面的问题, 如果明天我们修改表格, 添加/删除功能,滚动 更改为所有实时用户 帐户/表格 - 我从页面上知道 我们有大师的观点 模板,但是对于数据库,如何 用户表会更新吗?是 我们手动做的事情或 表将像每个一样继续检查 24小时与系统表 更新其结构?
如果上述全部成立,这意味着我们正在维护 1 个具有系统默认值的主表集,那么每个用户都会将相同的值复制到他们的表中?有些字段例如系统锁定帐户之前的最大失败登录尝试次数。其中,我们的系统默认为 30 分钟内尝试登录 5 次。但我想允许用户也指定自己的号码来自定义他们赢得的安全性,这意味着他们可以覆盖自己表中的系统默认值?
谢谢。
Designing a user content website (kind of similar to yelp but for a different market and with photo sharing) and had few databse questions:
Does each user get their own set of
tables or are we storing multiple
user data into common tables? Since
this even a social network, when
user sizes grows for scalability
databases are usually partitioned
off. Different sets of users are
sent separately, so what is the best
approach? I guess some data like
user accounts can be in common
tables but wall posts, photos etc
each user will get their own table?
If so, then if we have 10 million
users then that means 10 million x
what ever number of tables per user?
This is currently being designed in
MySQLHow does the user tables know what
to create each time a user joins the
site? I am assuming there may be a
system table template from which it
is pulling in the fields?In addition to the above question,
if tomorrow we modify tables,
add/remove features, to roll the
changes down to all the live user
accounts/tables - I know from a page
point of view we have the master
template, but for the database, how
will the user tables be updated? Is
that something we manually do or the
table will keep checking like every
24 hrs with the system tables for
updates to its structure?If the above is all true, that means we are maintaining 1 master set of tables with system default values, then each user get the same value copied to their tables? Some fields like say Maximum failed login attempts before system locks account. One we have a system default of 5 login attempts within 30 minutes. But I want to allow users also to specify their own number to customize their won security, so that means they can overwrite the system default in their own table?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
通常,为每个实体(在本例中为用户)创建单独的表的想法不是一个好主意。如果每个表都是独立的,查询可能会很麻烦。
如果您的表很大,您应该使用索引优化表。如果它变得非常大,您可能还需要查看分区表。
这允许您将表视为 1 个对象,尽管它在逻辑上是分开的 - DBMS 处理大部分工作并为您提供 1 个对象。这样,您就可以像平常一样进行 SELECT、INSERT、UPDATE、ALTER 等操作,并且数据库会找出 SQL 引用的分区并执行命令。
不按用户拆分表,而是使用索引和分区,可以在保持性能的同时处理可扩展性。如果您不手动拆分表格,这也会使第 2、3 和 4 点变得毫无意义。
以下是分区表的链接(特定于 SQL Server):
http://databases.about.com/od/sqlserver/a/partitioning。嗯
Generally the idea of creating separate tables for each entity (in this case users) is not a good idea. If each table is separate querying may be cumbersome.
If your table is large you should optimize the table with indexes. If it gets very large, you also may want to look into partitioning tables.
This allows you to see the table as 1 object, though it is logically split up - the DBMS handles most of the work and presents you with 1 object. This way you SELECT, INSERT, UPDATE, ALTER etc as normal, and the DB figures out which partition the SQL refers to and performs the command.
Not splitting up the tables by users, instead using indexes and partitions, would deal with scalability while maintaining performance. if you don't split up the tables manually, this also makes that points 2, 3, and 4 moot.
Here's a link to partitioning tables (SQL Server-specific):
http://databases.about.com/od/sqlserver/a/partitioning.htm
为每个用户创建一组表对我来说没有任何意义。如果您为所有用户提供一组通用的表,那么我认为这可以避免您询问的所有问题。
It doesn't make any kind of sense to me to create a set of tables for each user. If you have a common set of tables for all users then I think that avoids all the issues you are asking about.
听起来您需要找到有关关系数据库设计基础知识的入门知识。无论您正在设计哪种类型的应用程序,都应该从那里开始。了解连接如何工作、索引、主键和外键等。了解基本的数据库规范化。
在应用程序中动态创建新表并不常见。在正确设计的模式中通常没有必要。通常架构更改是在部署时完成的。 “用户”唯一一次获得自己的表是供应决策的产物,其中每个“用户”实际上是围墙花园中的租户;仅当每个“用户”(更可能是公司或组织)永远不需要访问系统中其他用户存储的任何内容时,这才有意义。
有一些机制可以处理数据库中松散结构的信息类型,但是如果您发现自己经常这样做(最常见的方法称为实体属性值),那么您的问题要么没有完全正确地建模,要么您可能没有实际上需要一个关系数据库,在这种情况下,使用像 CouchDB/MongoDB 这样的面向文档的数据库可能会更好。
根据您更新的评论/注释添加:
您对特定表中的记录数量的担忧很可能为时过早。首先让一些东西发挥作用。大多数现代 DBMS,包括较新版本的 MySql,都支持索引和聚集索引以外的机制,可以帮助处理大量记录。也就是说,在 MS Sql Server 中,您可以对表上的字段创建分区函数; MySql 5.1+ 有一些基于哈希函数、范围或其他机制的类似分区选项。遵循既定的数据库设计惯例,尽可能合理地对您的领域进行建模,然后在遇到问题时进行调整。首先使用您选择的数据库中可用的工具进行调整,然后仅在您可以证明需要时才考虑采取更严厉的措施。在您甚至想考虑使用像“每个用户表”模型这样对数据库系统来说不惯用的东西之前,还有其他类型的非规范化更有可能有意义;即使我要看看那条路线,我可能会首先考虑诸如物化视图之类的东西。
It sounds like you need to locate a primer on relational database design basics. Regardless of the type of application you are designing, you should start there. Learn how joins work, indices, primary and foreign keys, and so on. Learn about basic database normalization.
It's not customary to create new tables on-the-fly in an application; it's usually unnecessary in a properly designed schema. Usually schema changes are done at deployment time. The only time "users" get their own tables is an artifact of a provisioning decision, wherein each "user" is effectively a tenant in a walled-off garden; this only makes sense if each "user" (more likely, a company or organization) never needs access to anything that other users in the system have stored.
There are mechanisms for dealing with loosely structured types of information in databases, but if you find yourself reaching for this often (the most common method is called Entity-Attribute-Value), your problem is either not quite correctly modeled, or you may not actually need a relational database, in which case it might be better off with a document-oriented database like CouchDB/MongoDB.
Adding, based on your updated comments/notes:
Your concerns about the number of records in a particular table are most likely premature. Get something working first. Most modern DBMSes, including newer versions of MySql, support mechanisms beyond indices and clustered indices that can help deal with large numbers of records. To wit, in MS Sql Server you can create a partition function on fields on a table; MySql 5.1+ has a few similar partitioning options based on hash functions, ranges, or other mechanisms. Follow well-established conventions for database design modeling your domain as sensibly as possible, then adjust when you run into problems. First adjust using the tools available within your choice of database, then consider more drastic measures only when you can prove they are needed. There are other kinds of denormalization that are more likely to make sense before you would even want to consider having something as unidiomatic to database systems as a "table per user" model; even if I were to look at that route, I'd probably consider something like materialized views first.
我同意上面的评论,即每个用户一个表是一个坏主意。另外,虽然现在考虑如何应对事情变得非常大的策略是个好主意,但我会首先专注于为少数用户做好事情 - 如果没有人愿意/能够这样做使用您的服务,那么不幸的是您将不会面临大量用户的问题。
超大型网站的常见方法是数据库分片< /a>.总结是:您有 N 个并行数据库实例(在不同的计算机上),每个实例保存总数据的 1/N。有一些共享的方法可以知道哪个实例保存给定的数据位。要访问某些数据,您需要执行 2 个步骤,而不是您可能期望的 1 个步骤:
这存在问题,例如:您设置了 8 个分片,它们都填满了up,因此您想要共享超过 20 个分片的数据 ->在分片之间迁移数据。
I agree with the comments above that say that a table per user is a bad idea. Also, while it's a good idea to have strategies in mind now for how you can cope when things get really big, I'd concentrate on getting things right for a small number of users first - if no-one wants to / is able to use your service, then unfortunately you won't be faced with the problem of lots of users.
A common approach among very large sites is database sharding. The summary is: you have N instances of your database in parallel (on separate machines), and each holds 1/N of the total data. There's some shared way of knowing which instance holds a given bit of data. To access some data you have 2 steps, rather than the 1 you might expect:
There are problems with this, such as: you set up e.g. 8 shards and they all fill up, so you want to share the data over e.g. 20 shards -> migrating data between shards.