2 关于数据库开发理念和最佳实践的问题
关于 Web 应用程序的数据库的实现,哪一个最好:一个精简且非常小的数据库,只有裸露的信息,同时还有一个应用程序,根据需要,根据这些基本信息“重新计算”所有辅助信息,或者,一个数据库充满了之前已经计算出的所有辅助信息,但可能已经过时了?
显然,这是一个权衡,我认为任何人都会说这个问题的最佳答案是:“取决于”或“两者之间的混合”。但我真的不太舒服或没有足够的经验来单独推理这个主题。有人可以分享一些想法吗?
另外,另一个不同的问题: 数据库应该是特定时刻的“快照”,还是应该积累之前时间的所有信息,以便回溯所发生的事情?例如,假设我正在对银行帐户进行建模。我应该只保留该人当天的余额,还是应该保留该人的所有交易,并从这些交易中推断出余额?
关于这类在数据库设计方面更深入的东西有什么指示吗?
谢谢
Which one is best, regarding the implementation of a database for a web application: a lean and very small database with only the bare information, sided with a application that "recalculates" all the secondary information, on demand, based on those basic ones, OR, a database filled with all those secondary information already previously calculated, but possibly outdated?
Obviously, there is a trade-of there and I think that anyone would say that the best answer to this question is: "depends" or "is a mix between the two". But I'm really not to comfortable or experienced enough to reason alone about this subject. Could someone share some thoughts?
Also, another different question:
Should a database be the "snapshot" of a particular moment in time or should a database accumulate all the information from previous time, allowing the retrace of what happened? For instance, let's say that I'm modeling a Bank Account. Should I only keep the one's balance on that day, or should I keep all the one's transactions, and from those transactions infer the balance?
Any pointer on this kind of stuff that is, somehow, more deep in database design?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我的快速回答是将所有内容存储在数据库中。在谈论超大规模应用程序时,存储成本远远低于处理成本。在小规模应用程序中,数据会少得多,因此存储仍然是一个合适的解决方案。
大多数RDMS都非常擅长处理海量数据,因此当有数百万/万亿条记录时,仍然可以相对较快地提取数据,这对于每次手动处理数据来说就不是那么容易了。
如果您选择计算数据而不是存储数据,则处理时间的增加速度不会与数据大小的增加速度相同 - 数据越多,用户就越多。这通常意味着处理时间将乘以数据大小和用户数量。
为了回答您的其他问题,我认为最好的做法是仅当数据达到如此高的值以至于处理时间将很长时才引入特定时刻的“快照”。
在计算大额金额(例如银行余额)时,最好将任何大量计算的结果及其日期戳存储到数据库中。这仅仅意味着他们不需要再次计算,直到它变得过时。
My quick answer would be to store everything in the database. The cost of storage is far lower than the cost of processing when talking about very large scale applications. On small scale applications, the data would be far less, so storage would still be an appropriate solution.
Most RDMSes are extremely good at handling vast amounts of data, so when there are millions/trillions of records, the data can still be extracted relatively quickly, which can't be said about processing the data manually each time.
If you choose to calculate data rather than store it, the processing time doesn't increase at the same rate as the size of data does - the more data ~ the more users. This would generally mean that processing times would multiply by the data's size and the number of users.
To answer your other question, I think it would be best practice to introduce a "snapshot" of a particular moment only when data amounts to such a high value that processing time will be significant.
When calculating large sums, such as bank balances, it would be good practice to store the result of any heavy calculations, along with their date stamp, to the database. This would simply mean that they will not need calculating again until it becomes out of date.
没有理由让预先计算的值过时。这就是触发器的用途(除其他外)。然而,对于大多数应用程序,除非您需要,否则我不会开始预先计算。可能计算速度一直都在。现在,在银行应用程序中,您需要几乎立即从数千甚至数百万条记录中进行预计算,是的,可以基于触发器设计预计算流程,以便在每次更改值时调整值。
至于是否只存储时间图片或历史值,这很大程度上取决于您存储的内容。如果与财务数据有关,请存储历史记录。当您接受审核时,您将需要它。顺便说一句,设计为存储截至操作日期的一些数据(这不是非规范化)。例如,您有一个订单,不要依赖客户地址表或产品表来获取有关产品运往何处或订购时的成本的数据。这些数据会随着时间的推移而变化,然后您的订单就不再准确。您不希望您的财务报告因 6 个月后价格发生变化而更改销售金额。
还有其他一些东西可能不需要历史存储。在大多数申请中,我们不需要知道您两年前是 Judy Jones,现在是 Judy Smith(HR 申请通常是一个例外)。
There is no reason to ever have out of date pre-calulated values. That's what trigger are for (among other things). However for most applications, I would not start precalculating until you need to. It may be that the calculation speed is always there. Now in a banking application, where you need to pre-calculate from thousands or even millions of records almost immediately, yes, design a precalulation process bases on triggers that adjust the values every time they are changed.
As to whether to store just a picture in time or historical values, that depends largely on what you are storing. If it has anything to do with financial data, store the history. You will need it when you are audited. Incidentally, design to store some data as of the date of the action (this is not denormalization). For instance, you have an order, do not rely onthe customer address table or the product table to get data about where the prodcts were shipped to or what they cost at the time of the order. This data changes over time and then you orders are no longer accurate. You don't want your financial reports to change the dollar amount sold because the price changed 6 months later.
There are other things that may not need to be stored historically. In most applications we don't need to know that you were Judy Jones 2 years ago and are Judy Smith now (HR application are usually an exception).
我想说,一开始只是跟踪您需要的数据并即时执行计算,但在整个设计过程以及软件的测试/生产过程中,请记住,您可能必须切换到存储预先计算的数据某个时刻的值。如果需要,设计时能够迁移到该模型。
添加预先计算的值是听起来不错的事情之一(因为在许多情况下它很好),但可能不需要。保持设计尽可能简单。如果性能成为动态计算的一个问题,那么您可以向数据库添加字段来存储计算,并在夜间运行批处理以赶上并填充旧数据。
至于银行业的比喻,肯定会存储所有交易的完整记录。存储任何相关的数据。数据库应该是过去和现在数据的存储。审计跟踪等。“当前状态”可以即时计算,也可以在平面表中维护,并在写入其他表期间重新计算(触发器对于此类事情很有用),如果性能需要的话。
I'd say start off just tracking the data you need and perform the calculations on the fly, but throughout the design process and well into the test/production of the software keep in mind that you may have to switch to storing the pre-calculated values at some point. Design with the ability to move to that model if the need arises.
Adding the pre-calculated values is one of those things that sounds good (because in many cases it is good) but might not be needed. Keep the design as simple as it needs to be. If performance becomes an issue in doing the calculations on the fly, then you can add fields to the database to store the calculations and run a batch overnight to catch up and fill in the legacy data.
As for the banking metaphor, definitely store a complete record of all transactions. Store any data that's relevant. A database should be a store of data, past and present. Audit trails, etc. The "current state" can either be calculated on the fly or it can be maintained in a flat table and re-calculated during writes to other tables (triggers are good for that sort of thing) if performance demands it.
这取决于:) 在数据库中保留派生数据可能很有用,因为它使您能够针对它实施约束和其他逻辑。它还可以被索引,或者您可以将计算放入视图中。无论如何,请尝试坚持使用 Boyce-Codd / 第五范式作为数据库设计的指南。与您有时听到的相反,规范化并不意味着您不能存储派生数据 - 它只是意味着数据不应从同一个表中的非键属性派生。
从根本上来说,任何数据库都是特定时间点已知事实的记录。大多数数据库都包含一些时间组件,并且一些数据被保留,而另一些则没有 - 需求应该规定这一点。
It depends :) Persisting derived data in the database can be useful because it enables you to implement constraints and other logic against it. Also it can be indexed or you may be able to put the calculations in a view. In any case, try to stick to Boyce-Codd / 5th Normal Form as a guide for your database design. Contrary to what you may sometimes hear, normalization does not mean you cannot store derived data - it just means data shouldn't be derived from nonkey attributes in the same table.
Fundamentally any database is a record of the known facts at a particular point in time. Most databases include some time component and some data is preserved whereas some is not - requirements should dictate this.
你已经回答了你自己的问题。
您所做的任何选择都取决于应用程序的要求。
有时速度获胜,有时空间获胜。有时数据准确性获胜,有时快照获胜。
虽然您可能无法判断什么是重要的,但您为其解决问题的人应该能够为您解答。
You've answered your own question.
Any choices that you make depend on the requirements of the application.
Sometimes speed wins, sometimes space wins. Sometime data accuracy wins, sometimes snapshots win.
While you may not have the ability to tell what's important, the person you're solving the problem for should be able to answer that for you.
我喜欢动态规划(不计算任何东西)。如果您不受空间限制并且可以接受一些过时的数据,那么可以预先计算它并存储在数据库中。这将为您带来额外的好处,即能够运行健全性检查并确保数据始终一致。
但正如其他人已经回答的那样,这取决于:)
I like dynamic programming(not calculate anything twise). If you're not limited with space and are fine with a bit outdated data, then precalculate it and store in the DB. This will give you additional benefit of being able to run sanity checks and ensure that data is always consistent.
But as others already replied, it depends :)