数据库内记忆化 - 好主意吗?有什么经验吗?
我有一个尚未实现的想法,因为我担心我可能会找错树……主要是因为谷歌搜索该主题返回的结果很少。
基本上我有一些 SQL 查询速度很慢,很大程度上是因为它们有子查询,很耗时。例如,他们可能会做“给我统计 10-15 岁男孩骑过的所有红色自行车的数量”之类的事情。这很昂贵,因为它会遍历所有自行车,但最终结果只是一个数字。而且,就我而言,我并不需要该数字 100% 是最新的。
此类问题的最终解决方案似乎是应用基于 OLAP 的引擎来预先缓存这些排列。然而,就我而言,我并不是真的试图围绕大量指标对数据进行切片和切块,而且我希望不必因为另一个进程/数据存储的运行而使我的架构变得复杂。
所以......我的想法基本上是在数据库中记住这些子查询。我可能有一个名为“BicycleStatistics”的表,它可能会将上面子查询的输出存储为其输入和输出的名称值对。
例如名称:“c_red_g_male_a_10-15”值:235
并具有一种机制,可以在运行查询时将这些值记忆到该表中。
有没有人遇到过这种情况并尝试过类似的事情?我认为这样的解决方案比“在数据库中投入大量 RAM 并让数据库处理它”更有价值的原因是 (A) 我的数据库大于我可以方便地投入的 RAM 量,并且 ( B) 数据库将确保我获得这些统计数据的准确数字,而我的最大胜利是,我可以接受这些数字过时一两天。
感谢您的任何想法/反馈。
汤姆
I have an idea I have yet to implement, because I have some fear I may be barking up the wrong tree... mainly because Googling on the topic returns so few results.
Basically I have some SQL queries that are slow, in large part because they have subqueries that are time-consuming. For example, they might do things like "give me a count of all bicycles that are red and ridden by boys between the ages of 10-15". This is expensive as it sloshes through all of the bicycles, but the end result is a single number. And, in my case, I don't really need that number to be 100% up to date.
The ultimate solution for problems of this sort seems to be to apply an OLAP-based engine to pre-cache these permutations. However, in my case I'm not really trying to slice and dice the data around a ton of metrics, and I'd love not to have to complicate my architecture with yet another process/datastore running.
So... my idea was basically memoizing these subqueries in the database. I might have a table called "BicycleStatistics" and it might store the output of that subquery above as a name value pair of it's inputs and outputs.
Ex name: "c_red_g_male_a_10-15" value: 235
And have a mechanism that memoizes those values to that table as the queries are run.
Has anyone been in this situation and tried anything similar? The reason I think a solution like this is valuable over the "throw a lot of RAM in your DB and let the database handle it" is (A) my database is bigger than the amount of RAM I can conveniently throw at it, and (B) the database is going to ensure I get the exact right number for these statistics, and my big win, above, is that I'm ok with the numbers being a day or two out of date.
Thanks for any thoughts/feedback.
Tom
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
物化视图是实现此要求的一种方法(如果您的 DBMS 支持的话)。
Materialized views are a way of achieving this requirement, if your DBMS supports them.