搜索非规范化在这里和 Google 中,关于这个主题有很多讨论。找到最适合您具体情况的方法,尽管从表面上看,非规范化对于您的情况来说是过早的优化。
This is called "denormalization" and is considered bad practice unless you get a significant performance boost when you denormalize.
The biggest issue with this, however, is concurrency. What happens if two people vote on the poll and they both try to increment the VoteCount column?
Search denormalization on here and in Google, there has been plenty of discussion on this topic. Find what fits your exact situation best, although, from the looks of it, denormalization would be premature optimization in your situation.
Guaranteed problems and data inconsistencies. The vote count is "derived data" and should not be stored (a duplicate). For stable data (that which does not change), summaries are fair enough.
Now if the data (no of votes) is large, and you need to count them often (in queries), then enhance that alone, the speed of the vote table from the main table, eg ensure there is an index on column being looked up for the count.
If the data is massive. Eg. a bank with millions of transactions per month, and you do not want to count them in order to produce the account balance on every query, enhance that alone. Eg. I calculate a month to date figure every night and store it at the account level; the days figure, needs to be counted, and added to the MTD figure, in order to produce the true up-to-the-minute figure. At the end of month, that month, when all the auditing processes are changing various rows across the month, the MTD figure (to yesterday) can be executed on demand.
The short answer is YES. But you should keep in mind that duplication may become a trouble or even nightmare of your system development and maintenance. If you want to store some pre-calculated cache values to improve performance, the calculation process of cache should be encapsulated and transparent to other processes.
In this case:
Solution 1: When one user votes on the poll, the detailed information will be recorded, and the vote count should be increased one automatically. (i.e. the cache calculation is encapsulated in data-writer process).
Solution 2: When the vote imformation is recoreded, nothing to do on the vote count, only a flag will be changed to mark the vote count value as dirty now. When the vote count is read, if its value is dirty, calculate it and update its value and the flag; if its value is latest (not dirty), read it directly. (i.e. the cache calculation is encapsulated in data-reader process).
Read Section 7 of the famous book The Pragmatic Programmer, you may get some ideas.
Actually, the Normal Forms used in database design is a special case of the DRY principle.
In short NO, there is no point to store data that can be fetched with a COUNT query and the second reason thet you have to manually manipulate the counter value - more work, bigger problem possibility, you have to maintain that code/algorithm. Really do NOT do it, it is a bad practice.
发布评论
评论(4)
这称为“非规范化”,被认为是不好的做法,除非您得到当您进行非规范化时,性能会显着提升。
然而,最大的问题是并发性。如果两个人在民意调查中投票并且都尝试增加
VoteCount
列,会发生什么情况?搜索非规范化在这里和 Google 中,关于这个主题有很多讨论。找到最适合您具体情况的方法,尽管从表面上看,非规范化对于您的情况来说是过早的优化。
This is called "denormalization" and is considered bad practice unless you get a significant performance boost when you denormalize.
The biggest issue with this, however, is concurrency. What happens if two people vote on the poll and they both try to increment the
VoteCount
column?Search denormalization on here and in Google, there has been plenty of discussion on this topic. Find what fits your exact situation best, although, from the looks of it, denormalization would be premature optimization in your situation.
坏的。
不正确。
保证出现问题和数据不一致。投票计数是“派生数据”,不应存储(重复)。对于稳定的数据(不会改变的数据),摘要就足够公平了。
现在,如果数据(投票数)很大,并且您需要经常对它们进行计数(在查询中),那么单独提高主表中投票表的速度,例如确保正在查找的列上有索引等待计数。
如果数据海量。例如。一家银行每月有数百万笔交易,并且您不想为了在每次查询时生成帐户余额而对它们进行计数,那么仅增强这一点即可。例如。我每晚计算一个月至今的数字并将其存储在帐户级别;需要计算天数,并将其添加到 MTD 数字中,才能生成真实的最新数字。在月底,即当月,当所有审计进程都在更改该月的各个行时,可以按需执行 MTD 数据(到昨天)。
Bad.
Incorrect.
Guaranteed problems and data inconsistencies. The vote count is "derived data" and should not be stored (a duplicate). For stable data (that which does not change), summaries are fair enough.
Now if the data (no of votes) is large, and you need to count them often (in queries), then enhance that alone, the speed of the vote table from the main table, eg ensure there is an index on column being looked up for the count.
If the data is massive. Eg. a bank with millions of transactions per month, and you do not want to count them in order to produce the account balance on every query, enhance that alone. Eg. I calculate a month to date figure every night and store it at the account level; the days figure, needs to be counted, and added to the MTD figure, in order to produce the true up-to-the-minute figure. At the end of month, that month, when all the auditing processes are changing various rows across the month, the MTD figure (to yesterday) can be executed on demand.
简短的回答是肯定的。但您应该记住,重复可能会成为您的系统开发和维护的麻烦甚至噩梦。如果想存储一些预先计算好的缓存值来提高性能,那么缓存的计算过程就应该被封装起来,并且对其他进程透明。
在这种情况下:
解决方案一:当一个用户对投票进行投票时,会记录详细信息,并且投票数会自动增加一。 (即缓存计算被封装在数据写入器进程中)。
解决方案2:当记录投票信息时,与计票无关,仅更改标志以将计票值标记为脏。当读取投票计数时,如果它的值是脏的,则计算它并更新它的值和标志;如果它的值是最新的(不是脏的),则直接读取它。 (即缓存计算被封装在数据读取器进程中)。
阅读名著The Pragmatic Programmer的第7节,你可能会得到一些想法。
实际上,数据库设计中使用的范式是DRY原则的一个特例。
The short answer is YES. But you should keep in mind that duplication may become a trouble or even nightmare of your system development and maintenance. If you want to store some pre-calculated cache values to improve performance, the calculation process of cache should be encapsulated and transparent to other processes.
In this case:
Solution 1: When one user votes on the poll, the detailed information will be recorded, and the vote count should be increased one automatically. (i.e. the cache calculation is encapsulated in data-writer process).
Solution 2: When the vote imformation is recoreded, nothing to do on the vote count, only a flag will be changed to mark the vote count value as dirty now. When the vote count is read, if its value is dirty, calculate it and update its value and the flag; if its value is latest (not dirty), read it directly. (i.e. the cache calculation is encapsulated in data-reader process).
Read Section 7 of the famous book The Pragmatic Programmer, you may get some ideas.
Actually, the Normal Forms used in database design is a special case of the DRY principle.
简而言之,没有意义存储可以通过 COUNT 查询获取的数据,第二个原因是您必须手动操作计数器值 - 更多的工作,更大的问题可能性,您必须维护该代码/算法。真的不要这样做,这是一种不好的做法。
In short NO, there is no point to store data that can be fetched with a COUNT query and the second reason thet you have to manually manipulate the counter value - more work, bigger problem possibility, you have to maintain that code/algorithm. Really do NOT do it, it is a bad practice.