缓存相关数据的模式
我目前正在开发应用程序的基础,并寻找优化性能的方法。我的设置基于 CakePHP 框架,但我相信我的问题与任何技术堆栈相关,因为它与数据相关缓存。
让我们采用一个典型的作者后关系,它由我的数据库中的 2 个表表示。当我在数据库中查询特定博客文章时,同时 CakePHP 中的内置 ORM 功能还会获取该文章的作者、该文章的评论等。所有这些都作为一个巨大的嵌套数组返回,我使用相关博客文章的唯一标识符将其存储在缓存中。
更新博客文章时,销毁该文章的缓存并在下一个请求中重新生成它是一种小游戏。
但是,当更新的不是主要实体(在本例中为博客文章)而是一些相关数据时,会发生什么情况?例如,可以删除评论,或者作者可以更新他的头像。我是否可以考虑使用任何方法(模式)来跟踪相关数据的更新,并相应地将更新应用到我的缓存?
我很想知道您是否也遇到过类似的挑战,以及您如何克服这些障碍。如果您使用的是另一个堆栈,请随意提供抽象的观点。无论如何,非常感谢您的意见,非常感谢!
I'm currently developing the foundation of a an application, and looking for ways to optimize performance. My setup is based on the CakePHP framework, but I believe my question is relevant to any technology stack, as it relates to data caching.
Let's take a typical post-author relation, which is represented by 2 tables in my db. When I query the database for a specific blog post, at the same time the built-in ORM functionality in CakePHP also fetches the author of the post, comments on the post, etc. All of this is returned as one big-ass nested array, which I store in cache using a unique identifier for the concerned blog post.
When updating the blog post, it is child play to destroy the cache for the post, and have it regenerated with the next request.
But what happens when not the main entity (in this case the blog post) gets updated, but rather some of the related data? For example, a comment could be deleted, or the author could update his avatar. Are there any approaches (patterns) which I could consider for tracking updates to related data, and applying updates to my cache accordingly?
I'm curious to hear whether you've also run into similar challenges, and how you have managed to potentially overcome the hurdle. Feel free to provide an abstract perspective, if you're using another stack on your end. Your views are anyhow much appreciated, many thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这很简单,缓存条目可以被
当相关数据发生变化时,您应该注意销毁缓存条目(因此,在应用程序层中,除了更新数据之外,您还应该在更新某些表时销毁某些类型的缓存条目;您保留通过硬编码来跟踪依赖关系)。
如果你想聪明一点,你可以让你的缓存对象声明它们的依赖关系,并缓存数据库表的最后更新时间。
然后,您可以
您甚至可以将上述内容集成到持久层中。
编辑:
当然以上是当你想要有一致的缓存时。有时,对于某些数据,您可以放宽一致性要求,并且在某些情况下,简单的 TTL 就足够了(举个简单的例子,如果您的 ttl 为 1 秒,那么您应该不会给用户带来麻烦,并且可以帮助您数据处理;如果时间更长,您可能仍然没问题 - 例如,假设您正在缓存国家/地区 ISO 代码列表;如果您说让我们缓存 86400 秒,那么您的应用程序可能完全没问题) 。
此外,您还可以跟踪向用户提供信息的时间,例如
这会产生从 DB 额外读取数据 A 的成本,但它仅发生在写入时。
此外,冲突不仅可能因为缓存而发生,还可能因为多个用户试图更改数据而发生(即,它与锁定策略有关)。
It is rather simple, cache entries can be
You should take care of destroying cache entries when related data change (so in application layer in addition to updating the data you should destroy certain types of cached entries when you update certain tables; you keep track of dependencies by hard-coding it).
If you'd like to be smart about it you could have your cache object state their dependencies and cache the last update times for your DB tables as well.
Then you could
You could even integrate the above into your persistence layer.
EDIT:
Of course the above is for when you want to have consistent cache. Sometimes, and for some data, you can relax the consistency requirements and there are scenarios where simple TTL will be good enough (for a trivial example, if you have ttl of 1 sec, you should mostly be out of trouble with users and can help data processing; and with higher times you might still be ok - for example let's say you are caching the list of country ISO codes; your application might be perfectly ok if you say let's cache this for 86400 sec).
Furthermore, you could also track the times of information presented to user, for example
This has a cost of extra read of data A from DB, but it occurs only on writes.
Also, the conflict can occur not only because of the cache, but also because of multiple users trying to change the data (i.e. it is related to locking strategies).
memcached 的一种方法是使用标签 ( http://code.google.com/p/memcached -标签/)。例如,您有您的帖子“大屁股嵌套数组”,可以说,它包括作者信息、帖子本身,并显示在首页和侧边栏的某个框中。因此它获取标签:frontpage、{auhothor-id}、sidebar、{post-id} - 现在,如果有人更改作者信息,您将使用标签 {author-id} 刷新每个缓存条目。但这只是一种解决方案,并且仅适用于支持标签的缓存后端,例如不是 APC(据我所知)。希望这给了你一个例子。
One Approach for memcached is to use tags ( http://code.google.com/p/memcached-tag/ ). For Example, you have your Post "big-ass nested array" lets say, it inclused the autors information, the post itself and is shown on the frontpage and in some box in the sidebar. So it gets the tags: frontpage, {auhothor-id}, sidebar, {post-id} - now if someone changes the Author Information you flush every cache entry with the tag {author-id}. But thats only one Solution, and only for Cache Backends that support Tags, for example not APC (afaik). Hope That gave you an example.