规范化我的数据库会破坏可扩展性吗?
我有一个数据库,它将构成一个高流量网络应用程序的一部分。
我想知道是否应该规范化表格,以便诸如“question_type”之类的内容也应该放在单独的表格中,有关问题的所有基本信息(例如“title”和“question_body”)也应该放在单独的表格中?
我之所以这么问,是因为我需要这个数据库尽可能具有可扩展性,并且我被告知当您需要可扩展性时,规范化并不总是可行的方法。
谢谢
I have a database that will form part of a highly trafficked web app.
I'm wondering if I should normalise the tables so things such as (e.g.) 'question_type' should be in a separate table too all the basic information about the question such as 'title' and 'question_body'?
I'm only asking because I need this database to be as scalable as possible and I'm told normalisation isn't always the way to go when you need scalability.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
标准化成为扩展问题的原因在于它往往需要将多个表连接在一起。连接在小表上非常有用,但表越大,服务器需要的工作就越困难。
主要要注意的是避免连接。如果您可以通过向其中一个表添加字段来执行无需联接的查询,那么您只会加快该查询的性能。
The thing that makes normalization an issue with scaling is that it tends to need to have multiple tables join together. Joins are great on small tables but the larger the table grows the harder the server needs to work.
The main thing to look to is avoiding joins. If you can do the query without a join by adding a field to one of the tables, you just speed up the performance of that query.
如果您的表有
question_body
和question_type
,那么我不知道如何将其移动到另一个表来实现标准化。例如:将单个值拆分到单个列表中除了无用的连接之外不会实现任何其他目的。也就是说:
是一个等价但浪费的形式
另一方面,(使用上面的示例),将问题用户信息拆分到自己的表中是很有意义的:
因此,如果用户的排名发生了变化( ala SO),您只需在一个地方更改它,而不是在他提出问题的每一行中更改它。自从您将数百个更新更改为单个更新以来,您处理扩展的能力已经提高了。
If your table has a
question_body
andquestion_type
, then I don't see how moving it to another table achieves normalization. e.g.:Splitting out a single value into a single column table won't achieve anything other than useless joins. That is:
is an equivalent, but wasteful form of
On the other hand, (using the example above), it makes a lot of sense to split out the question user information into its own table:
So if a user has his rank changed (ala SO), you only have to change it in one place rather than in every row where he's asked a question. You've increased your ability to handling scaling since you've changed hundreds of updates into a single update.
现在这是一个沉重的问题。与其说标准化是一条硬性规则,不如说是一个指导方针。设计数据库由一系列关于规范化级别的决策组成,考虑到您对代码效率、性能和完整性等的需求,这些决策是有意义的。这太过于简单化了,但设计决策的范围涵盖了大量精心编写的书籍。
您能告诉我更多有关您的应用程序和目标平台的信息吗?如果我能更好地了解您的情况,我也许可以引导您找到一些非常有用的参考材料。
Now that's a loaded question. Normalization isn't a hard rule so much as a guideline. Designing a database is made up of a series of decisions regarding the level of normalization that makes sense given your need for code efficiency, performance and integrity, among other things. That's greatly oversimplifying it, but the spectrum of design decisions spans volumes of well-authored books.
Can you tell me a little bit more about your application and intended platform? I might be able to steer you in the direction of some very useful reference material if I can better understand your situation.
加盐会让我的食物味道更好吗?
同样的问题。没有人能回答。
主要问题在于,在应用程序中使用查找缓存而不是数据库连接取决于您的使用模式以及在某种程度上您作为程序员的能力。相当多的程序员从来没有超越过 SQL 的“炒鸡蛋,烧焦”的水平,以烹饪为例。
对于可扩展性应用程序设计和数据库技术还有很多话要说。 Oracle RAC 安装无与伦比。取决于您对 Exadata 平台的需求。我认为最小单位的成本约为 50 万美元。仍然确定您需要“尽可能可扩展”吗?这里不是开玩笑 - 我现在在一个 6000 GB 的数据仓库工作,我们刚刚订购了 3 个这样的怪物,而不是最小的一个。
那么,“尽可能可扩展”是什么意思?这就像“我的车需要跑得尽可能快,甚至更快”,然后你最终会得到一辆装有喷气发动机的特制汽车;)
一般规则:
* 将交易和报告分离到两个数据库中。第二个是数据仓库。
* 规范化事务数据库
* 在数据仓库上使用星型模式。
很大的机会是:你不知道你在说什么,从来没有做过可扩展性,所以有 80% 的机会你的“高可扩展性”需求对于一个像样的数据库服务器来说是一个笑话。现在,这并不意味着侮辱,但我看到很多人说“我的表中有大量数据”,这导致 ou 最多为 10.000 行。这没什么大不了的——这只是一个笑话。我们每天将 1 亿条数据加载到我们的数据仓库主表中(并且必须将它们保留很多年)。大多数人并没有真正获得像样的数据库服务器所能提供的速度。这意味着很多光盘。
Will adding salt make my food taste better?
Same question. Noone can answer.
The main proble mis that it depends on your USAGE patterns and to soem degree your competence as programmer, to use lookup caches in the application instead of database joins. Quite a lot of programmers never get above the "scrambled eggs, burned" level of SQL, to keep a cooking analogy.
For scalability application design AND database technology have a lot more to say. Hard to beat an Oracle RAC installation. Depending on what you need on an Exadata platform. Cost is I think around half a million USD for the smallest unit. Still sure you need "as scalable as possible"? Not joking here - I right now work on a 6000 gb data warehouse, we just ordered 3 of those monsters, and not the smallest one.
So, what do you mean with "as scalable as possible"? THis is like "my car needs to go as fast as a car ever has gone and more", then you end up with a special made car with a jet engine in it ;)
General rule:
* Separate transactions and reporting into two databases. The second being a data warehouse.
* Normalize transactional db
* Use star schema on data warehouse.
BIG chance is: you dont kno what you talk ab out, never did scalability, so there is a 80% chance your "high scalability" requirement is a joke for a decent database server. Now, that is not meant insulting, but i have seen SO many people say "I have a ton of data in a table" which turns ou to be 10.000 rows maximum. That is not a ton - it is a joke. We load 100 million daily into our data warehouse main table (and have to keep them many years). Most peope dont really get the speed a decent database server can provide. Which means many discs.