实现模型级缓存

发布于 2024-09-02 22:43:47 字数 1415 浏览 7 评论 0原文

我在相关问题中发布了一些关于 MVC 缓存和一些内容的评论出现了有关实际实施的问题。如何实现模型级缓存，使其透明地工作，而不需要开发人员手动缓存，但仍然保持高效？

我会保留我的缓存坚定地承担责任模型。它不是控制器的或查看模型所在的业务获取数据。他们关心的只是当请求数据时，数据是提供 - 这就是 MVC 的方式范式应该有效。

我的原因我持怀疑态度，因为除非确实需要，否则通常不应该进行缓存，并且不应该对搜索结果等内容进行缓存。因此，模型本身必须以某种方式知道向其发出的 SELECT 语句是否值得缓存。为了准确地做出决策，模型是否必须非常智能，和/或存储很长一段时间内最常查询的内容的统计数据？所有这些的开销难道不会使缓存变得毫无用处吗？

如何从另一个查询中唯一地识别一个查询（或者更准确地说，从另一个结果集中唯一地识别一个结果集）？如果您使用准备好的语句，仅参数根据用户输入更改，该怎么办？

另一位海报是这样说的：

我建议使用 md5 哈希值您的查询与序列化相结合输入参数的版本。

微小的碰撞可能性值得担心吗？

从概念上讲，在模型中进行缓存对我来说似乎是一个好主意，但在实践中似乎是这样，并且由于缓存的性质，开发人员应该对其进行直接控制并将其显式编码到控制器逻辑中。

赏金更新

我确实使用了一个极其轻量级的 ORM，有点类似于 ActiveRecord，但能够执行复杂的连接和子查询，而不会出现 n^2 问题。我自己构建了它，因此它很灵活，并且在关系或列名方面不受限制，我只是想了解应该如何实现缓存机制。

根据乐于助人的建议，我将采用与其参数列表连接的查询的哈希值（可能是 md5），并将其用作该特定数据存储的键。我应该在需要缓存的模型类中单独实现缓存，还是应该将其作为 ORM 层的一部分？

我怎么知道什么时候应该失效？我是否必须手动解析 UPDATE/DELETE/INSERT 查询和子参数才能找出哪些记录正在被修改？或者更糟糕的是，每当修改数据时都会进行额外的查询以跟踪哪些内容已更改以及哪些内容应无效？

我将向任何能够给我提供清晰概念解释的人奖励（无论这是否真的有必要/有效地透明地完成），如果是这样，则有模型缓存的一些实现细节。我正在使用 PHP 和 MySQL，如果这有助于缩小您的关注范围。

原文

I was posting some comments in a related question about MVC caching and some questions about actual implementation came up. How does one implement a Model-level cache that works transparently without the developer needing to manually cache, yet still remains efficient?

I would keep my caching
responsibilities firmly within the
model. It is none of the controller's
or view's business where the model is
getting data. All they care about is
that when data is requested, data is
provided - this is how the MVC
paradigm is supposed to work.

(Source: Post by Jarrod)

The reason I am skeptical is because caching should usually not be done unless there is a real need, and shouldn't be done for things like search results. So somehow the Model itself has to know whether or not the SELECT statement being issued to it is worthy of being cached. Wouldn't the Model have to be astronomically smart, and/or store statistics of what is being most often queried over a long period of time in order to accurately make a decision? And wouldn't the overhead of all this make the caching useless anyway?

How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?

Another poster said this:

I would suggest using the md5 hash of
your query combined with a serialized
version of your input arguments.

Is the minuscule chance of collision worth worrying about?

Conceptually, caching in the Model seems like a good idea to me, but it seems in practicality and due to the nature of caching the developer should have direct control over it and explicity code it into the controller logic.

Update for Bounty

I am indeed using an extremely lightweight ORM somewhat similar to ActiveRecord but is capable of doing complex joins and subqueries without the n^2 problem. I built it myself, so it is flexible and isn't restrictive in terms of relations or column names, and I just want to understand how I should implement the caching mechanism.

Following the advice of the helpful people, I would take a hash (probably md5) of the query concatenated with a list of its parameters, and use this as the key for that particular data store. Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?

How do I know when it should be invalidated? Would I have to parse the UPDATE/DELETE/INSERT queries and sub in parameters manually to find out which records are being modified? Or worse, do additional queries whenever data is modified to keep track of which things have changed and what should be invalidated?

I will award the bounty to whoever can give me a clear conceptual explanation (whether or not this is really necessary/efficient to be done transparently), and if so, has some implementation details for the Model caching. I am using PHP and MySQL if that helps to narrow your focus.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倒带 2024-09-09 22:43:47

仅当模型是一个简单的 ORM 时，您的帖子才有意义。有很多原因可以解释为什么这是一件坏事。尝试将该模型视为 Web 服务。

缓存是模型的职责。

如何从另一个查询中唯一地标识一个查询（或者更准确地说，从另一个结果集中唯一地标识一个结果集）？如果您使用准备好的语句，仅参数根据用户输入更改，该怎么办？

但模型的输入唯一地定义了其输出。

如果您使用相同的模型来检索购物篮的内容并在产品目录上运行搜索，那么您的代码就有问题。

即使在购物篮的情况下，缓存数据的 TTL 也可能比处理会更改其内容的事务所需的时间短，在目录搜索的情况下，缓存匹配产品的列表以供使用。几个小时可能不会对销售产生任何可衡量的影响，但可以在减少数据库负载方面取得良好的效果。

事实上，您正在使用一个开箱即用的简单 ORM，但这并不排除您将其包装在自己的代码中。

模型是否必须非常智能，和/或存储统计数据

？否。您可以决定是否缓存，如果无法确保缓存一致，则根据请求类型强制执行 TTL。

作为一般经验法则，您应该能够在绑定任何变量之前根据 SELECT 查询预测适当的 TTL，并且这需要在设计时实现 - 但显然结果应该基于绑定后的查询。

我应该在需要缓存的模型类中单独实现缓存，还是应该将其作为 ORM 层的一部分？

作为首选，我会将其实现为模型类上的装饰器 - 这样您就可以轻松地将其移植到实现工厂而不是简单的 ORM 的模型。

回复收藏 0 原文

束缚ｍ 2024-09-09 22:43:47

缓存需要考虑很多因素，例如散列、失效等。但是缓存的目标始终是相同的：减少响应时间和资源消耗。

对于不使用 ORM 的系统，以下是我的一些快速想法：

如果您有足够的内存，使用 memcache 缓存某些内容永远不会有坏处
您应该只缓存 SELECT 查询，因为其他类型会影响数据
所有缓存的查询都应该参数化
缓存键应该是与 serialize()'d 连接的查询的 md5参数的版本（这标识了唯一的查询。序列化参数不是问题，因为通常传递给选择查询的参数大小通常非常微不足道）。序列化并不像您想象的那么昂贵。而且因为您对与动态参数连接的静态查询进行了哈希处理，所以您永远不必担心冲突。
对模型中行的修改（INSERT/UPDATE/DELETE）应该使为该模型缓存的所有项目无效（或设置 TTL
）模型应该扩展以允许缓存 TTL 值与查询一起发送
您的模型应该支持跳过缓存（可能通过与查询一起传递 TTL 0）
尽管可以缓存基本查询，但通常在新的（修改后的）查询中应用 ORDER BY / LIMIT 类型操作比从缓存中提取整个行集并通过 PHP 操作它来实现相同的效果更有效（除非您的网络和数据库服务器之间存在非常高的延迟）。

尝试管理 ORM 系统的缓存验证是完全不同的野兽（由于关系），并且应该根据具体情况进行处理（在控制器中）。但如果您真正关心性能，您很可能一开始就不会使用 ORM。

更新：

如果您发现自己在单个线程中使用同一模型类的多个实例，我建议您还可能对实例化模型进行内存缓存（取决于您的构造函数，反序列化和唤醒对象有时会更有效）而不是构造一个对象）。一旦你有了一个初始化的对象（无论是构造的还是反序列化的），世界上更高效clone()一个对象的基本实例并且设置其新状态而不是在 PHP 中重建对象。

There are quite a few factors to consider with caching, such as hashing, invalidation, etc. But the goal of caching is always the same: to reduce response times and resource consumption.

Here are a couple of quick thoughts off the top of my head for systems that do not use ORM:

It never hurts to cache something using memcache if you have the memory for it
You should only ever cache SELECT queries since other types affect data
All cached queries should be parametized
The cache key should be an md5 of the query concatenated with a serialize()'d version of the parameters (this identifies unique queries. Seralizing parameters is not an issue because the size of parameters generally passed to select queries is usually quite trivial). Serializing isn't as expensive as you think. And because you hashed your static query concatenated with your dynamic params, you should never have to worry about collisions.
Modifications (INSERT/UPDATE/DELETE) to rows in a model should invalidate (or set a TTL) on all items cached for that model
The model should be extended to allow for cache TTL values to be sent along with a query
Your model should have support for skipping the cache (probably by passing TTL of 0 along with the query)
Even though a base query may be cached, it is generally more efficient to apply ORDER BY / LIMIT type operations in a new (modified) query rather than to pull an entire rowset from cache and manipulate it through PHP to achieve the same thing (unless there is very high latency between your web and database servers).

Attempting to manage cache validation for an ORM system is a completely different beast (due to relations), and should probably be handled on a case-by-case basis (in the controller). But if you're truly concerned with performance, chances are you wouldn't be using an ORM to begin with.

UPDATE:

If you find yourself using multiple instances of the same model class within a single thread, I would suggest also potentially memcaching your instantiated model (depending on your constructor, deserializing and waking an object is sometimes more efficient than constructing an object). Once you have an intialized object (whether constructed or deserialized), it is worlds more efficient to clone() a basic instance of an object and set its new state rather than to reconstruct an object in PHP.

回复收藏 0 原文

一曲爱恨情仇 2024-09-09 22:43:47

我持怀疑态度的原因是
通常不应该进行缓存
除非确实有需要，并且
不应该做这样的事情
搜索结果。所以不知何故模型
本身必须知道是否
向其发出 SELECT 语句
值得被缓存。难道不是
模型必须具有天文般的智能，
和/或存储什么是的统计数据
长期以来最常被查询
一段时间，以便准确地
做出决定？又岂不是
所有这些的开销使得缓存
无论如何都没用吗？

还有谁更适合跟踪这些内容？多个控制器将使用相同的模型来获取他们需要的数据。那么控制者到底如何能够做出理性的决定呢？

没有硬性规定——智能缓存策略几乎完全由上下文驱动。业务逻辑（同样，模型！）将决定缓存中应该包含哪些类型的内容、何时需要使缓存失效等。

您认为缓存搜索结果似乎是一个坏主意，这是完全正确的。我确信通常都是这样。如果生成搜索结果的成本非常昂贵，并且您正在执行分页之类的操作，那么您可能需要一个每用户缓存来保存最新结果以及搜索参数。但我认为这是一个相当特殊的情况。

如果没有上下文，很难给出更具体的建议，但这里有一些场景：

1) 您有可以分配类别的业务对象。类别很少改变。您的类别模型应该缓存用于读取操作的完整类别集。当不频繁的正确操作发生时，它们会使缓存失效。系统中的每个视图脚本现在都可以查询模型并获取当前类别（例如，用于渲染选择框），而无需关心缓存。系统中的任何控制器现在都可以在不知道缓存的情况下添加/更新/删除类别。

2）您有一些复杂的公式，它使用多个输入并为某种“产品”创建受欢迎程度评级。页面布局中的某些小部件以摘要形式显示 5 个最流行的对象。您的 Product 模型将提供 getPopular() 方法，该方法依赖于缓存。该模型可以每隔 X 分钟使缓存失效，或者某些后台进程可以定期运行以使其失效/重建。无论系统的哪个部分想要流行的产品，它们都会通过模型来请求，该模型透明地管理缓存。

确切的缓存实现高度依赖于您正在操作的数据类型以及典型的用例。

这里需要注意的是，如果您滥用 ActiveRecord，和/或在控制器中编写 SQL 查询（或等效项），您可能会遇到问题。如果您有一个漂亮、丰富的模型层来准确地对您的域进行建模，而不是仅包装数据库表的脆弱模型，那么进行智能缓存就会容易得多。

这不是关于模型是否聪明，而是关于开发人员是否聪明。

The reason I am skeptical is because
caching should usually not be done
unless there is a real need, and
shouldn't be done for things like
search results. So somehow the Model
itself has to know whether or not the
SELECT statement being issued to it
worthy of being cached. Wouldn't the
Model have to be astronomically smart,
and/or store statistics of what is
being most often queried over a long
period of time in order to accurately
make a decision? And wouldn't the
overhead of all this make the caching
useless anyway?

Who else is better suited to track any of that? Multiple controllers will be using the same model to fetch the data they need. So how in the world would a controller be able to make a rational decision?

There are no hard and fast rules -- a smart caching strategy is almost completely driven by context. The business logic (again, models!) is going to dictate what sorts of things ought to be in the cache, when the cache needs to be invalidated, etc.

You're absolutely right that caching search results seems like a bad idea. I'm sure it usually is. It's possible that if your search results are very expensive to generate, and you're doing something like pagination, you might want a per-user cache that holds the most recent results, along with the search parameters. But I think that's a fairly special case.

It's difficult to give more specific advice without the context, but here are a couple of scenarios:

1) You have business objects that can have a category assigned. The categories rarely change. Your Category model ought to cache the full set of categories for read operations. When the infrequent right operations occur, they can invalidate the cache. Every view script in the system can now query the model and get the current categories back (for rendering select boxes, let's say) without concerning itself with the cache. Any controller in the system can now add/update/delete categories without knowing about the cache.

2) You have some complex formula that consumes multiple inputs and creates a popularity rating for some kind of "products". Some widget in your page layout shows the 5 most popular objects in summary form. Your Product model would provide a getPopular() method, which would rely on the cache. The model could invalidate the cache every X minutes, or some background process could run at regular intervals to invalidate/rebuild. No matter what part of the system wants the popular products, they request it via the model, which transparently manages the cache.

The exact caching implementation is highly dependent on the sort of data you're manipulating, combined with the typical use cases.

The caveat here is that if you're abusing ActiveRecord, and/or composing SQL queries (or equivalents) in your controllers, you're probably going to have issues. Doing smart caching is a lot easier if you've got a nice, rich, model layer that accurately models your domain, instead of flimsy models that just wrap database tables.

It's not about the Models being smart, it's about the developer being smart.

回复收藏 0 原文