是在数据库表中缓存某些值更好,还是每次都重新计算它更好?
例如,我有一个银行用户表(用户 ID、用户名)和一个交易表(用户 ID、帐户 ID、金额)。
不同用户的帐户具有相同的属性,但持有不同的金额(例如 Alex -> Grocery,它特定于 Alex,但所有其他用户也有 Grocery 帐户)。
问题是,创建一个单独的帐户表(帐户 ID、用户 ID、剩余金额)更好,还是通过选择具有所需用户 ID 和帐户 ID 的所有交易并仅将“金额”值相加来获取此值会更好吗? ?似乎第一种方法会更快,但更容易出现错误和数据库损坏——每次交易发生时我都需要更新帐户。第二种方法似乎更干净,但是会导致速度显着降低吗?
你会推荐什么?
For example, I have a table of bank users (user id, user name), and a table for transactions (user id, account id, amount).
Accounts have the same properties across different users, but hold different amounts (like Alex -> Grocery, it is specific to Alex, but all other users also have Grocery account).
The question is, would it be better to create a separate table of accounts (account id, user id, amount left) or to get this value by selecting all transactions with the needed user id and account id and just summing the 'amount' values? It seems that the first approach would be faster, but more prone to error and database corruption - I would need to update accounts every time the transaction happens. The second approach seems to be cleaner, but would it lead to significant speed reduction?
What would you recommend?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
好问题!
在我看来,您应该始终避免重复的数据,因此我会选择“求和”每次选项
“似乎第一种方法会更快,但更容易出现错误和数据库损坏 - 我每次都需要更新帐户交易发生了”
,你可能会遇到错误,并且必须建立一种机制来保持数据最新。
不要忘记第一种方法仅选择会更快。插入更新和删除会更慢,因为您必须更新第二个表。
good question!
In my opinion you should always avoid duplicated data so I would go with the "summing" every time option
"It seems that the first approach would be faster, but more prone to error and database corruption - I would need to update accounts every time the transaction happens"
said everything, you are subject to errors and you'll have to build a mechanism to maintain the data up-to-date.
Dont forget that the first approach would be faster to select only. inserts updates and deletes would be slower because you will have to update your second table.
这是非规范化的示例。
一般来说,不鼓励非规范化,但也有某些例外情况 - 银行账户余额通常就是这样的例外情况之一。
因此,如果这正是您的情况,我建议您使用单独的科目表解决方案 - 但如果您的记录比银行通常少得多,那么我建议您使用派生方法。
This is an example of Denormalization.
In general, denormalization is discouraged, but there are certain exceptions - bank account balances are typically one such exception.
So if this is your exact situation, I would suggest going with the separate table of accounts solution - but if you have far fewer records than a bank would typically, then I recommend the derived approach, instead.
在某种程度上,这取决于。
对于“小”数据量,性能很可能还不错。
但随着数据量的增长,必须对所有事务求和的成本可能会变得更高,以至于您开始注意到性能问题。
还要考虑数据访问/使用模式。在就绪型系统中,“一次写入,多次就绪”,则 SUM 方法会影响每次读取的性能 - 在这种情况下,在写入时进行一次性能影响可能是有意义的,以提高后续的读取性能。
如果您预计“大量”数据量,我肯定会使用额外的表来保存高级别总计。您需要确保在(sql server)事务中进行(货币)事务时更新它,以使其成为原子操作。
对于较小的数据量,您可以不用它......就我个人而言,我可能仍然会走这条路,以简化读取场景。
To some extent, it depends.
With "small" data volumes, performance will more than likely be OK.
But as data volumes grow, having to SUM all transactions may become costlier to the point at which you start noticing a performance problem.
Also to consider is data access/usage patterns. In a ready-heavy system, where you "write once, ready many", then the SUM approach hits performance on every read - in this scenario, it may make sense to take a performance hit once on write, to improve subsequent read performance.
If you anticipate "large" data volumes, I'd definitely go with the extra table to hold the high level totals. You need to ensure though that it is updated when a (monetary) transaction is made, within a (sql server) transaction to make it an atomic operation.
With smaller data volumes, you could get away without it...personally, I'd probably still go down that path, to simplify the read scenario.
仅当您面临重大性能问题时,才有意义使用非规范化方法(第一个解决方案)。由于您只是使用适当的索引进行简单的 SUM(或分组然后求和),因此您的标准化解决方案将工作得很好并且更容易维护(正如您所指出的)。
但根据您的查询,使用非规范化解决方案可能是有意义的...例如,如果您的数据库是只读的(您定期从其他数据源加载数据并且根本不进行插入/更新或进行它们真的很少),那么您可以以最简单的方式加载数据来进行查询......在这种情况下,非规范化解决方案可能会更好。
It makes sense to go with the denormalized approach (the first solution) only if you face significant performance issues. Since you are doing just simple SUM (or group by and then sum) with proper indexes, your normalized solution will work really well and will be a lot easier to maintain (as you noted).
But depending on your queries, it can make sense to go with denormalized solution...for example, if your database is read/only (you periodically load data from some other data source and don't make inserts/updates at all or make them really rarely), then you can just load data in the easiest way to make queries...and in that case, denormalized solution might prove to be better.