如果您能负担得起,该如何缓存?
我正在开发一个正在处理传入数据的应用程序,目前需要针对每个传入数据点点击数据库。问题是双重的:
- 数据库无法跟上负载的
- 数据库返回结果的结果不到5%的查询,
第一个想法是将关系数据库中的数据缓存到诸如REDIS之类的东西中以提高查找速度。但是,所有常规的缓存策略都取决于您可以在需要的情况下返回数据库并从那里获取数据的事实。在我的情况下,这是有问题的,因为对于95%的查询,数据库中没有什么,我没有任何可存储在缓存中的问题。我当然可以将空结果存储在缓存中,但这意味着95%(甚至更多,取决于数据的组成)是我的缓存存储的垃圾。
首选的方法是实现没有任何错过的缓存系统:数据库中的所有内容始终存在于缓存中,因此,如果它不在缓存中,则它不在数据库中。环顾四周之后,我发现redis的一致性似乎不够可靠,无法始终做出假设 - 如果键在redis中不存在,我如何100%确定它在数据库中不存在(假设我们不在更新中吗?非常有一种强烈的要求,如果数据库中有关于传入数据点的行,则需要找到它并且不能仅仅丢失。
我该如何设计一个将始终具有与关系数据库相同数据的缓存系统 - 而无需退缩以查看数据库中的数据? Redis可能不是工具,但是您建议什么?我应该查找我没有想到的模式或关键字吗?
I'm developing an app that is processing incoming data and currently needs to hit the database for each incoming datapoint. The problem is twofold:
- the database can't keep up with the load
- the database returns results for less than 5% of the queries
The first idea is to cache the data from the relational database into something like Redis to improve lookup speed. But all the regular caching strategies rely on the fact that you can fall back to the database if needed and fetch data from there. This is problematic in my case because for 95% of the queries there is nothing in the database and I don't have anything to store in the cache. I can of course store the empty results in the cache but that would mean that 95% (or even more, depending on the composition of data) of my cache storage would be rubbish.
The preferred way to do it would be to implement a caching system that doesn't have any misses: everything from the database is always present in the cache and therefore if it's not in the cache, then it's not in the database. After looking around though I found that the consistency of Redis does not seem reliable enough to always make that assumption - if the key doesn't exist in Redis, how can I be 100% sure that it doesn't exist in the database (assuming that we're not in the midst of an update)? It is a strong requirement that if there is a row in the database about an incoming datapoint, then it needs to be found and can't just be missed out on.
How do I go about designing a caching system that will always have the same data as the relational database - without having a fallback to look the data up in the database? Redis might not be the tool but what would you recommend? Is there a pattern or a keyword that I should look up that I haven't thought of?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
数据库中已经有这样的缓存:共享缓冲区。因此,您要做的就是设置
shared_buffers
足够大以包含整个数据库和重新启动。很快,整个数据库将被缓存,并且阅读将不再导致I/O,并且会很快。如果您不能缓存整个数据库,也只要您只需要访问其中的一部分即可:PostgreSQL将仅缓存那些正在使用的8KB页面。
我认为,添加另一个外部缓存系统永远不会比这更好。如果数据经常修改,尤其如此:任何外部缓存系统都必须确保其数据不是陈旧的,这将引入额外的开销。
There already is such a cache in the database: shared buffers. So all you have to do is to set
shared_buffers
big enough to contain the whole database and restart. Soon the whole database will be cached, and reading will cause no more I/O and will be fast.That also works if you cannot cache the whole database, as long as you only need to access part of it: PostgreSQL will then just cache those 8kB-pages that are in use.
In my opinion, adding another external caching system can never do better than that. That is particularly true if data are ever modified: any external caching system would have to make sure that its data are not stale, which would introduce an additional overhead.