缓存相关数据的模式

发布于 2024-09-25 05:14:20 字数 516 浏览 7 评论 0原文

我目前正在开发应用程序的基础,并寻找优化性能的方法。我的设置基于 CakePHP 框架,但我相信我的问题与任何技术堆栈相关,因为它与数据相关缓存。

让我们采用一个典型的作者后关系,它由我的数据库中的 2 个表表示。当我在数据库中查询特定博客文章时,同时 CakePHP 中的内置 ORM 功能还会获取该文章的作者、该文章的评论等。所有这些都作为一个巨大的嵌套数组返回,我使用相关博客文章的唯一标识符将其存储在缓存中。

更新博客文章时,销毁该文章的缓存并在下一个请求中重新生成它是一种小游戏。

但是,当更新的不是主要实体(在本例中为博客文章)而是一些相关数据时,会发生什么情况?例如,可以删除评论,或者作者可以更新他的头像。我是否可以考虑使用任何方法(模式)来跟踪相关数据的更新,并相应地将更新应用到我的缓存?

我很想知道您是否也遇到过类似的挑战,以及您如何克服这些障碍。如果您使用的是另一个堆栈,请随意提供抽象的观点。无论如何,非常感谢您的意见,非常感谢!

I'm currently developing the foundation of a an application, and looking for ways to optimize performance. My setup is based on the CakePHP framework, but I believe my question is relevant to any technology stack, as it relates to data caching.

Let's take a typical post-author relation, which is represented by 2 tables in my db. When I query the database for a specific blog post, at the same time the built-in ORM functionality in CakePHP also fetches the author of the post, comments on the post, etc. All of this is returned as one big-ass nested array, which I store in cache using a unique identifier for the concerned blog post.

When updating the blog post, it is child play to destroy the cache for the post, and have it regenerated with the next request.

But what happens when not the main entity (in this case the blog post) gets updated, but rather some of the related data? For example, a comment could be deleted, or the author could update his avatar. Are there any approaches (patterns) which I could consider for tracking updates to related data, and applying updates to my cache accordingly?

I'm curious to hear whether you've also run into similar challenges, and how you have managed to potentially overcome the hurdle. Feel free to provide an abstract perspective, if you're using another stack on your end. Your views are anyhow much appreciated, many thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

开始看清了 2024-10-02 05:14:20

这很简单,缓存条目可以被

  • 添加
  • 销毁,

当相关数据发生变化时,您应该注意销毁缓存条目(因此,在应用程序层中,除了更新数据之外,您还应该在更新某些表时销毁某些类型的缓存条目;您保留通过硬编码来跟踪依赖关系)。

如果你想聪明一点,你可以让你的缓存对象声明它们的依赖关系,并缓存数据库表的最后更新时间。

然后,您可以

  • 获取缓存数据,检查依赖关系,
  • 获取相关数据库表的更新时间,
  • 如果记录过时(您的大屁股缓存条目所依赖的表的更新时间晚于缓存条目的时间),请将其删除并从数据库中获取新数据。

您甚至可以将上述内容集成到持久层中。

编辑:
当然以上是当你想要有一致的缓存时。有时,对于某些数据,您可以放宽一致性要求,并且在某些情况下,简单的 TTL 就足够了(举个简单的例子,如果您的 ttl 为 1 秒,那么您应该不会给用户带来麻烦,并且可以帮助您数据处理;如果时间更长,您可能仍然没问题 - 例如,假设您正在缓存国家/地区 ISO 代码列表;如果您说让我们缓存 86400 秒,那么您的应用程序可能完全没问题) 。

此外,您还可以跟踪向用户提供信息的时间,例如

  • ,假设用户已从缓存中看到数据 A,并且我们知道该数据是在时间 t1
  • 用户对数据 A 进行更改(并使其数据 B)并提交更改,
  • 然后应用程序层可以检查数据 A 是否仍与 DB 中一样(用户做出决策和/或更改所依据的缓存数据是否确实是最新的)
  • ,如果不是的话 em> fresh 那么存在冲突,用户应该确认更改

这会产生从 DB 额外读取数据 A 的成本,但它仅发生在写入时。
此外,冲突不仅可能因为缓存而发生,还可能因为多个用户试图更改数据而发生(即,它与锁定策略有关)。

It is rather simple, cache entries can be

  • added
  • destroyed

You should take care of destroying cache entries when related data change (so in application layer in addition to updating the data you should destroy certain types of cached entries when you update certain tables; you keep track of dependencies by hard-coding it).

If you'd like to be smart about it you could have your cache object state their dependencies and cache the last update times for your DB tables as well.

Then you could

  • fetch cached data, examine dependencies,
  • get update times for relevant DB tables and
  • in case the record is stale (update time of a table that your big ass cache entry depends on is later then the time of the cache entry) drop it and get fresh data from the database.

You could even integrate the above into your persistence layer.

EDIT:
Of course the above is for when you want to have consistent cache. Sometimes, and for some data, you can relax the consistency requirements and there are scenarios where simple TTL will be good enough (for a trivial example, if you have ttl of 1 sec, you should mostly be out of trouble with users and can help data processing; and with higher times you might still be ok - for example let's say you are caching the list of country ISO codes; your application might be perfectly ok if you say let's cache this for 86400 sec).

Furthermore, you could also track the times of information presented to user, for example

  • let's say user has seen data A from cache and that we know that this data was created/modified at time t1
  • user makes changes to the data A (and makes it data B) and commits the change
  • the application layer can then examine if the data A is still as in DB (if the cached data upon which the user made decisions and/or changes was indeed fresh)
  • if it was not fresh then there is a conflict and user should confirm the changes

This has a cost of extra read of data A from DB, but it occurs only on writes.
Also, the conflict can occur not only because of the cache, but also because of multiple users trying to change the data (i.e. it is related to locking strategies).

指尖微凉心微凉 2024-10-02 05:14:20

memcached 的一种方法是使用标签 ( http://code.google.com/p/memcached -标签/)。例如,您有您的帖子“大屁股嵌套数组”,可以说,它包括作者信息、帖子本身,并显示在首页和侧边栏的某个框中。因此它获取标签:frontpage、{auhothor-id}、sidebar、{post-id} - 现在,如果有人更改作者信息,您将使用标签 {author-id} 刷新每个缓存条目。但这只是一种解决方案,并且仅适用于支持标签的缓存后端,例如不是 APC(据我所知)。希望这给了你一个例子。

One Approach for memcached is to use tags ( http://code.google.com/p/memcached-tag/ ). For Example, you have your Post "big-ass nested array" lets say, it inclused the autors information, the post itself and is shown on the frontpage and in some box in the sidebar. So it gets the tags: frontpage, {auhothor-id}, sidebar, {post-id} - now if someone changes the Author Information you flush every cache entry with the tag {author-id}. But thats only one Solution, and only for Cache Backends that support Tags, for example not APC (afaik). Hope That gave you an example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文