MySQL - 动态计算字段与存储计算数据
如果之前有人问过这个问题,我深表歉意,但我似乎无法找到关于动态计算与在数据库中存储字段的问题的答案。
我读过一些文章,建议最好在可以的时候进行计算,但我只是想知道这是否仍然适用于以下两个示例。
示例 1. 假设您正在存储与汽车相关的数据。您可以存储油箱容量(以升为单位)以及每 100 公里使用的升数。您还想知道它可以行驶多少公里,这可以根据坦克大小和经济性来计算。我看到有两种方法可以做到这一点:
- 当添加或更新汽车时,计算公里数并将其作为静态字段存储在数据库中。
- 每次访问汽车时,即时计算公里数。
因为汽车的经济性/油箱尺寸不会改变(尽管可以编辑),所以 KM 是一个相当静态的值。我不明白为什么我们每次访问汽车时都要计算它。与简单地将其存储在数据库中的单独字段中并仅在添加或更新汽车时进行计算相比,这不会浪费 CPU 时间吗?
我的下一个例子几乎是一个完全不同的问题(但关于同一主题),与计算孩子数有关。
假设我们有一个包含类别和项目的应用程序。我们有一个视图,在其中显示所有类别以及每个类别内所有项目的计数。我再次想知道什么是更好的。每次访问页面时执行 MySQL 查询来计算每个类别中的所有项目?或者将计数存储在类别表的字段中并在添加/删除项目时更新?
我知道存储任何可计算的内容是多余的,但我担心计算字段或计数记录可能比将数据存储在字段中慢。如果不是,请告诉我,我只是想了解何时使用这两种方法。在小规模上,我想这两种方式都不重要,但是像 Facebook 这样的应用程序,每次有人查看你的个人资料时,它们真的会计算你拥有的朋友数量吗?还是只是将其存储为一个字段?
如果您对这两种情况有任何回应,以及任何可以解释计算与存储的好处的资源,我将不胜感激。
预先感谢,
克里斯蒂安
I apologise if this has been asked before, but I can't seem to find an answer to a question that I have about calculating on the fly vs storing fields in a database.
I read a few articles that suggested it was preferable to calculate when you can, but I would just like to know if that still applies to the following 2 examples.
Example 1. Say you are storing data relating to a car. You store the fuel tank size in litres, and how many litres it uses per 100km. You also want to know how many KMs it can travel, which can be calculated from the tank size and economy. I see 2 ways of doing this:
- When a car is added or updated, calculate the amount of KMs and store this as a static field in the database.
- Every time a car is accessed, calculate the amount of KMs on the fly.
Because the cars economy/tank size doesn't change (although it could be edited), the KMs is a pretty static value. I don't see why we would calculate it every single time the car is accessed. Wouldn't this waste cpu time as opposed to simply storing it in a separate field in the database and calculating only when a car is added or updated?
My next example, which is almost an entirely different question (but on the same topic), relates to counting children.
Let's say we have a app which has categories and items. We have a view where we display all the categories, and a count of all the items inside each category. Again, I'm wondering what's better. To perform a MySQL query to count all the items in each category every single time the page is accessed? Or store the count in a field in the categories table and update when an item is added / deleted?
I know it is redundant to store anything that can be calculated, but I worry that calculating fields or counting records might be slow as opposed to storing the data in a field. If it's not then please let me know, I just want to learn about when to use either method. On a small scale I guess it wouldn't matter either way, but apps like Facebook, would they really count the amount of friends you have every time someone views your profile or would they just store it as a field?
I'd appreciate any responses to both of these scenarios, and any resource that might explain the benefits of calculating vs storing.
Thanks in advance,
Christian
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
在数据库中引入冗余是一种有效的优化手段。与所有优化一样,除非您已确认这就是瓶颈所在,否则不要这样做。
Introducing redundancy into the database is a valid means of optimization. As with all optimizations, don't do it unless you have confirmed that this is where the bottleneck actually is.
需要注意的一件事是您使用数据的方式。
如果多个应用程序或应用程序的多个层(可能是同一应用程序中的旧代码和新代码)正在访问您的数据,您将通过在数据库中进行预先计算来降低计算错误的风险。这样,无论哪个应用程序请求它,您的计算数据将始终相同。
对于您的第一个示例,没有理由有一天有人必须更改您的 KM 的计算方式。我会将其存储在数据库中(通过触发器或通过 PHP 进行插入/更新——因为 MySQl 触发器...好吧,它们...不如其他一些数据库触发器)。
现在,如果我们采用您的第二个示例,那么确实不确定有人会不希望有一天在该类别计算上添加一些过滤器。例如,只取 2 到 5 之间的孩子。那么所有预先计算的结果都毫无用处。
如果您需要对这些东西进行一些优化和缓存,那么您可能需要更多的应用程序层缓存,例如内存缓存或存储在缓存表中的预先计算的结果。但是这个缓存是一个应用程序缓存,它以某种方式与您的应用程序参数相关(具有不同过滤器的请求将使用缓存中的不同记录)。
请注意,使用 MySQl,您还拥有一个很好的查询缓存,这将防止相同的查询被计算太多。
One thing to notice is the way you use your data.
If several applications, or several layers of your application (maybe old code and new code in the same app) is accessing your data you'll reduce the risk of errors in computing by pre-calculating in the database. Then your calculated data will always be the same, no matter which application is requesting it.
For your first example, there is no reason that someone someday will have to change the way your KMs will need to be computed. I would store it in database (via triggers or via PHP on the insert/update -- because MySQl triggers are... well they are... not as good as some other DB triggers).
Now if we taking your second example it's really not sure someone will not want some day to add some filters on that categories computing. For example, take only children which are between 2 and 5. Then all your pre-computed results serves nothing.
If you need some optimizations and caches of theses things it's maybe more an application-layer cache you would need, something like memcache, or pre-computed results stored in a cache table. But this cache is an application cache, which is related in a certain way on your application parameters (requests with different filters would use a different record in the cache).
Note that with MySQl you've got as well a nice query cache which will prevent the same query to be computed too much.
其他人已经触及了技术方面,所以让我给您提供另一个需要考虑的观点:
对于您引入的每个异常,您都会使开发过程变慢。
非规范化数据、聚合、预连接数据等都是示例使开发变得非常复杂的东西,因为您必须:
在许多情况下,这是值得的,并且在某些情况下是绝对必要的,但如果不必要的话,牺牲开发速度将是非常愚蠢的。
Others have touched on the technical aspects, so let me give you another viewpoint to consider:
For every anomaly you introduce, you are making the development process slower.
Denormalized data, aggregates, prejoined data etcetera are all examples of stuff that greatly complicates development, because you have to:
In many cases, it's worth it and in some cases absolutely necessary, but it would be very stupid to sacrifice development speed if you don't have to.
在这两个示例中,您所讨论的值都是静态的,计算静态值只是无稽之谈。此外,如果我们假设表的查询次数多于更新次数,计算数据也会造成性能损失。
In both examples, the values you're talking about are static, and calculating static values is just a nonsense. Furthermore, if we assume that the tables are more queried than updated, calculating data is also a loss of performance.
为什么要避免在表中存储计算字段:
Why you should avoid storing calculated fields in the table: