实现模型级缓存

发布于 2024-09-02 22:43:47 字数 1415 浏览 4 评论 0原文

我在相关问题中发布了一些关于 MVC 缓存和一些内容的评论出现了有关实际实施的问题。如何实现模型级缓存,使其透明地工作,而不需要开发人员手动缓存,但仍然保持高效?

我会保留我的缓存 坚定地承担责任 模型。它不是控制器的 或查看模型所在的业务 获取数据。他们关心的只是 当请求数据时,数据是 提供 - 这就是 MVC 的方式 范式应该有效。

(来源:Jarrod 发布

我的原因我持怀疑态度,因为除非确实需要,否则通常不应该进行缓存,并且不应该对搜索结果等内容进行缓存。因此,模型本身必须以某种方式知道向其发出的 SELECT 语句是否值得缓存。为了准确地做出决策,模型是否必须非常智能,和/或存储很长一段时间内最常查询的内容的统计数据?所有这些的开销难道不会使缓存变得毫无用处吗?

如何从另一个查询中唯一地识别一个查询(或者更准确地说,从另一个结果集中唯一地识别一个结果集)?如果您使用准备好的语句,仅参数根据用户输入更改,该怎么办?

另一位海报是这样说的:

我建议使用 md5 哈希值 您的查询与序列化相结合 输入参数的版本。

微小的碰撞可能性值得担心吗?

从概念上讲,在模型中进行缓存对我来说似乎是一个好主意,但在实践中似乎是这样,并且由于缓存的性质,开发人员应该对其进行直接控制并将其显式编码到控制器逻辑中。


赏金更新

我确实使用了一个极其轻量级的 ORM,有点类似于 ActiveRecord,但能够执行复杂的连接和子查询,而不会出现 n^2 问题。我自己构建了它,因此它很灵活,并且在关系或列名方面不受限制,我只是想了解应该如何实现缓存机制。

根据乐于助人的建议,我将采用与其参数列表连接的查询的哈希值(可能是 md5),并将其用作该特定数据存储的键。我应该在需要缓存的模型类中单独实现缓存,还是应该将其作为 ORM 层的一部分?

我怎么知道什么时候应该失效?我是否必须手动解析 UPDATE/DELETE/INSERT 查询和子参数才能找出哪些记录正在被修改?或者更糟糕的是,每当修改数据时都会进行额外的查询以跟踪哪些内容已更改以及哪些内容应无效?

我将向任何能够给我提供清晰概念解释的人奖励(无论这是否真的有必要/有效地透明地完成),如果是这样,则有模型缓存的一些实现细节。我正在使用 PHP 和 MySQL,如果这有助于缩小您的关注范围。

I was posting some comments in a related question about MVC caching and some questions about actual implementation came up. How does one implement a Model-level cache that works transparently without the developer needing to manually cache, yet still remains efficient?

I would keep my caching
responsibilities firmly within the
model. It is none of the controller's
or view's business where the model is
getting data. All they care about is
that when data is requested, data is
provided - this is how the MVC
paradigm is supposed to work.

(Source: Post by Jarrod)

The reason I am skeptical is because caching should usually not be done unless there is a real need, and shouldn't be done for things like search results. So somehow the Model itself has to know whether or not the SELECT statement being issued to it is worthy of being cached. Wouldn't the Model have to be astronomically smart, and/or store statistics of what is being most often queried over a long period of time in order to accurately make a decision? And wouldn't the overhead of all this make the caching useless anyway?

How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?

Another poster said this:

I would suggest using the md5 hash of
your query combined with a serialized
version of your input arguments.

Is the minuscule chance of collision worth worrying about?

Conceptually, caching in the Model seems like a good idea to me, but it seems in practicality and due to the nature of caching the developer should have direct control over it and explicity code it into the controller logic.


Update for Bounty

I am indeed using an extremely lightweight ORM somewhat similar to ActiveRecord but is capable of doing complex joins and subqueries without the n^2 problem. I built it myself, so it is flexible and isn't restrictive in terms of relations or column names, and I just want to understand how I should implement the caching mechanism.

Following the advice of the helpful people, I would take a hash (probably md5) of the query concatenated with a list of its parameters, and use this as the key for that particular data store. Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?

How do I know when it should be invalidated? Would I have to parse the UPDATE/DELETE/INSERT queries and sub in parameters manually to find out which records are being modified? Or worse, do additional queries whenever data is modified to keep track of which things have changed and what should be invalidated?

I will award the bounty to whoever can give me a clear conceptual explanation (whether or not this is really necessary/efficient to be done transparently), and if so, has some implementation details for the Model caching. I am using PHP and MySQL if that helps to narrow your focus.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

倒带 2024-09-09 22:43:47

仅当模型是一个简单的 ORM 时,您的帖子才有意义。有很多原因可以解释为什么这是一件坏事。尝试将该模型视为 Web 服务。

缓存模型的职责。

如何从另一个查询中唯一地标识一个查询(或者更准确地说,从另一个结果集中唯一地标识一个结果集)?如果您使用准备好的语句,仅参数根据用户输入更改,该怎么办?

但模型的输入唯一地定义了其输出。

如果您使用相同的模型来检索购物篮的内容并在产品目录上运行搜索,那么您的代码就有问题。

即使在购物篮的情况下,缓存数据的 TTL 也可能比处理会更改其内容的事务所需的时间短,在目录搜索的情况下,缓存匹配产品的列表以供使用。几个小时可能不会对销售产生任何可衡量的影响,但可以在减少数据库负载方面取得良好的效果。

事实上,您正在使用一个开箱即用的简单 ORM,但这并不排除您将其包装在自己的代码中。

模型是否必须非常智能,和/或存储统计数据

?否。您可以决定是否缓存,如果无法确保缓存一致,则根据请求类型强制执行 TTL。

作为一般经验法则,您应该能够在绑定任何变量之前根据 SELECT 查询预测适当的 TTL,并且这需要在设计时实现 - 但显然结果应该基于绑定后的查询。

我应该在需要缓存的模型类中单独实现缓存,还是应该将其作为 ORM 层的一部分?

作为首选,我会将其实现为模型类上的装饰器 - 这样您就可以轻松地将其移植到实现工厂而不是简单的 ORM 的模型。

C.

Your post only makes any sense if the model is a trivial ORM. And there are lots of reasons why that's a bad thing. Try thinking about the model as if it were a web service.

Caching is the responsiblity of the model.

How would you uniquely identify a query from another query (or more accurately, a result set from another result set)? What about if you're using prepared statements, with only the parameters changing according to user input?

But the inputs to the model uniquely define its output.

If you're using the same model to retrieve the contents of a shopping basket and to run a search on your product catalog then there's something wrong with your code.

Even in the case of the shopping basket, there may be merit in caching data with a TTL of less than the time taken to process a transaction which would change its contents, in the case of the catalog search, caching the list of matching products for a few hours will probably have no measurable impact on sales, but trade-off well in reducing database load.

The fact that you are using a trivial ORM out of the box does not exclude you from wrapping it in your own code.

Wouldn't the Model have to be astronomically smart, and/or store statistics

No. You make the determination on whether to cache, and if you can't ensure that the cache is consistent then enforce a TTL based on the type of request.

As a general rule of thumb, you should be able to predict appropriate TTLs based on the SELECT query before binding any variables and this needs to be implemented at design time - but obviously the results should be indexed based on the query after binding.

Should I implement the caching individually in the Model classes that require it, or should it be part of the ORM layer?

For preference I would implement this as a decorator on the model class - that way you can easily port it to models which implement a factory rather than trivial ORM.

C.

束缚m 2024-09-09 22:43:47

缓存需要考虑很多因素,例如散列、失效等。但是缓存的目标始终是相同的:减少响应时间和资源消耗。

对于使用 ORM 的系统,以下是我的一些快速想法:

  • 如果您有足够的内存,使用 memcache 缓存某些内容永远不会有坏处
  • 您应该只缓存 SELECT 查询,因为其他类型会影响数据
  • 所有缓存的查询都应该参数化
  • 缓存键应该是与 serialize()'d 连接的查询的 md5参数的版本(这标识了唯一的查询。序列化参数不是问题,因为通常传递给选择查询的参数大小通常非常微不足道)。序列化并不像您想象的那么昂贵。而且因为您对与动态参数连接的静态查询进行了哈希处理,所以您永远不必担心冲突。
  • 对模型中行的修改(INSERT/UPDATE/DELETE)应该使为该模型缓存的所有项目无效(或设置 TTL
  • )模型应该扩展以允许缓存 TTL 值与查询一起发送
  • 您的模型应该支持跳过缓存(可能通过与查询一起传递 TTL 0)
  • 尽管可以缓存基本查询,但通常在新的(修改后的)查询中应用 ORDER BY / LIMIT 类型操作比从缓存中提取整个行集并通过 PHP 操作它来实现相同的效果更有效(除非您的网络和数据库服务器之间存在非常高的延迟)。

尝试管理 ORM 系统的缓存验证是完全不同的野兽(由于关系),并且应该根据具体情况进行处理(在控制器中)。但如果您真正关心性能,您很可能一开始就不会使用 ORM。

更新:

如果您发现自己在单个线程中使用同一模型类的多个实例,我建议您还可能对实例化模型进行内存缓存(取决于您的构造函数,反序列化和唤醒对象有时会更有效)而不是构造一个对象)。一旦你有了一个初始化的对象(无论是构造的还是反序列化的),世界上更高效clone()一个对象的基本实例并且设置其新状态而不是在 PHP 中重建对象。

There are quite a few factors to consider with caching, such as hashing, invalidation, etc. But the goal of caching is always the same: to reduce response times and resource consumption.

Here are a couple of quick thoughts off the top of my head for systems that do not use ORM:

  • It never hurts to cache something using memcache if you have the memory for it
  • You should only ever cache SELECT queries since other types affect data
  • All cached queries should be parametized
  • The cache key should be an md5 of the query concatenated with a serialize()'d version of the parameters (this identifies unique queries. Seralizing parameters is not an issue because the size of parameters generally passed to select queries is usually quite trivial). Serializing isn't as expensive as you think. And because you hashed your static query concatenated with your dynamic params, you should never have to worry about collisions.
  • Modifications (INSERT/UPDATE/DELETE) to rows in a model should invalidate (or set a TTL) on all items cached for that model
  • The model should be extended to allow for cache TTL values to be sent along with a query
  • Your model should have support for skipping the cache (probably by passing TTL of 0 along with the query)
  • Even though a base query may be cached, it is generally more efficient to apply ORDER BY / LIMIT type operations in a new (modified) query rather than to pull an entire rowset from cache and manipulate it through PHP to achieve the same thing (unless there is very high latency between your web and database servers).

Attempting to manage cache validation for an ORM system is a completely different beast (due to relations), and should probably be handled on a case-by-case basis (in the controller). But if you're truly concerned with performance, chances are you wouldn't be using an ORM to begin with.

UPDATE:

If you find yourself using multiple instances of the same model class within a single thread, I would suggest also potentially memcaching your instantiated model (depending on your constructor, deserializing and waking an object is sometimes more efficient than constructing an object). Once you have an intialized object (whether constructed or deserialized), it is worlds more efficient to clone() a basic instance of an object and set its new state rather than to reconstruct an object in PHP.

一曲爱恨情仇 2024-09-09 22:43:47

我持怀疑态度的原因是
通常不应该进行缓存
除非确实有需要,并且
不应该做这样的事情
搜索结果。所以不知何故模型
本身必须知道是否
向其发出 SELECT 语句
值得被缓存。难道不是
模型必须具有天文般的智能,
和/或存储什么是的统计数据
长期以来最常被查询
一段时间,以便准确地
做出决定?又岂不是
所有这些的开销使得缓存
无论如何都没用吗?

还有谁更适合跟踪这些内容?多个控制器将使用相同的模型来获取他们需要的数据。那么控制者到底如何能够做出理性的决定呢?

没有硬性规定——智能缓存策略几乎完全由上下文驱动。业务逻辑(同样,模型!)将决定缓存中应该包含哪些类型的内容、何时需要使缓存失效等。

您认为缓存搜索结果似乎是一个坏主意,这是完全正确的。我确信通常都是这样。如果生成搜索结果的成本非常昂贵,并且您正在执行分页之类的操作,那么您可能需要一个每用户缓存来保存最新结果以及搜索参数。但我认为这是一个相当特殊的情况。

如果没有上下文,很难给出更具体的建议,但这里有一些场景:

1) 您有可以分配类别的业务对象。类别很少改变。您的类别模型应该缓存用于读取操作的完整类别集。当不频繁的正确操作发生时,它们会使缓存失效。系统中的每个视图脚本现在都可以查询模型并获取当前类别(例如,用于渲染选择框),而无需关心缓存。系统中的任何控制器现在都可以在不知道缓存的情况下添加/更新/删除类别。

2)您有一些复杂的公式,它使用多个输入并为某种“产品”创建受欢迎程度评级。页面布局中的某些小部件以摘要形式显示 5 个最流行的对象。您的 Product 模型将提供 getPopular() 方法,该方法依赖于缓存。该模型可以每隔 X 分钟使缓存失效,或者某些后台进程可以定期运行以使其失效/重建。无论系统的哪个部分想要流行的产品,它们都会通过模型来请求,该模型透明地管理缓存。

确切的缓存实现高度依赖于您正在操作的数据类型以及典型的用例。

这里需要注意的是,如果您滥用 ActiveRecord,和/或在控制器中编写 SQL 查询(或等效项),您可能会遇到问题。如果您有一个漂亮、丰富的模型层来准确地对您的域进行建模,而不是仅包装数据库表的脆弱模型,那么进行智能缓存就会容易得多。

这不是关于模型是否聪明,而是关于开发人员是否聪明。

The reason I am skeptical is because
caching should usually not be done
unless there is a real need, and
shouldn't be done for things like
search results. So somehow the Model
itself has to know whether or not the
SELECT statement being issued to it
worthy of being cached. Wouldn't the
Model have to be astronomically smart,
and/or store statistics of what is
being most often queried over a long
period of time in order to accurately
make a decision? And wouldn't the
overhead of all this make the caching
useless anyway?

Who else is better suited to track any of that? Multiple controllers will be using the same model to fetch the data they need. So how in the world would a controller be able to make a rational decision?

There are no hard and fast rules -- a smart caching strategy is almost completely driven by context. The business logic (again, models!) is going to dictate what sorts of things ought to be in the cache, when the cache needs to be invalidated, etc.

You're absolutely right that caching search results seems like a bad idea. I'm sure it usually is. It's possible that if your search results are very expensive to generate, and you're doing something like pagination, you might want a per-user cache that holds the most recent results, along with the search parameters. But I think that's a fairly special case.

It's difficult to give more specific advice without the context, but here are a couple of scenarios:

1) You have business objects that can have a category assigned. The categories rarely change. Your Category model ought to cache the full set of categories for read operations. When the infrequent right operations occur, they can invalidate the cache. Every view script in the system can now query the model and get the current categories back (for rendering select boxes, let's say) without concerning itself with the cache. Any controller in the system can now add/update/delete categories without knowing about the cache.

2) You have some complex formula that consumes multiple inputs and creates a popularity rating for some kind of "products". Some widget in your page layout shows the 5 most popular objects in summary form. Your Product model would provide a getPopular() method, which would rely on the cache. The model could invalidate the cache every X minutes, or some background process could run at regular intervals to invalidate/rebuild. No matter what part of the system wants the popular products, they request it via the model, which transparently manages the cache.

The exact caching implementation is highly dependent on the sort of data you're manipulating, combined with the typical use cases.

The caveat here is that if you're abusing ActiveRecord, and/or composing SQL queries (or equivalents) in your controllers, you're probably going to have issues. Doing smart caching is a lot easier if you've got a nice, rich, model layer that accurately models your domain, instead of flimsy models that just wrap database tables.

It's not about the Models being smart, it's about the developer being smart.

岁月蹉跎了容颜 2024-09-09 22:43:47

我们所做的是构建一个缓存层来替代 MVC 的加载功能。这样,只有我们想要的实际模型调用才会被缓存。如果不需要或不需要缓存,则使用从控制器调用模型的正常方法。

如果通过缓存层调用模型及其最终参数,缓存层将首先根据缓存池验证请求的数据,如果仍然有效则返回该数据。如果是这样,则不会加载实际模型,而是将缓存数据返回到控制器。如果没有,则按照通常的方式调用模型。

能够在模型之上的层中执行此操作真的很棒,因为在每个查询/每个模型级别上引入信号量锁的使用变得非常容易,以进一步减少服务器负载。

对我来说最大的优势是模型是按预期设计的,只包含纯数据库查询。这样,就可以在最终用户不注意的情况下修改生产中的模型(当然,假设模型提供的请求数据在更新期间不需要重新创建。)

更新:我们还在我们的内部实现了命名空间缓存层有两个级别,每个模型的基础和可选的组的基础。因此,在数据库中更新或删除时,我们可以轻松地使所有先前使来自模型的所有缓存数据无效。

What we did, was building a cache layer as a replacement to the loading function of the MVC. This way, only the actual model calls that we want, will be cached. If no caching is necessary or unwanted, the normal way of calling a model from the controller is being used.

If a model is being called through the cachelayer, together with it's eventual parameters, the cache layer will first verify the requested data against the cache pool and return it if still valid. If so, the actual model is not loaded and cached data is just returned to the controller. If not, the model is called as it normally would be.

It's really great to have the possibility of doing this in a layer above the model, since it becomes very easy to introduce the usage of semaphore locks on a per-query / per-model level, to reduce server loads even further.

The biggest advantage to me is though the fact that the models are designed as intended and contains nothing but pure database queries. This way, it is possible to modify a model in production without end users even noticing (assuming that the requested data that a model delivers does not need recreation during the update time, of course.. )

Update: We have also implemented namespacing inside our cachelayer on two levels, a per-model basis and an optional group-basis. Thanks to that, we can easily invalidate all previously invalidate all cached data that comes from a model upon update or deletion in the database.

客…行舟 2024-09-09 22:43:47

如果您对活动记录库的更透明的缓存系统感兴趣。您可以为每个查询分配一个 id,然后创建结果的关联数组。您可以将这种关系静态或讽刺地存储在数据库中。(这是一种缓存交易,您必须使用更多的计算机能力,因此有时可以使用更少的计算机能力)

每次运行查询时跟踪生成的哈希值如果结果哈希不同,则更新新哈希。如果哈希值相同,则会增加重复结果的数量。如果出现所需数量的重复结果,则您可以缓存结果并停止检查表以获取分配的时间和/或后续查询运行。

你将有一个类来控制所有这些事情的发生。函数可以包括诸如

-start 缓存检查之类的内容
-设置阈值
- 始终缓存
-缓存时间寿命
-强制清除所有缓存
-清除此查询的缓存
-我们已经被死亡激光击中了,需要抓住一切(我讨厌你,wordpress,我再也不会使用你的功能,我不应该这么懒,制作自己的网站功能)

这将有助于自动化您的流程。缓存规则还可以在逐个模型的基础上实现,或者在整个应用程序上实现。

这可能比某些缓存系统的开销稍大一些,但如果您只想让缓存做自己的事情,我认为它会工作得很好;没有它跑得太疯狂。

If you where interested in a more transparent caching system for an active records library. You could assign an id to each query then create an associative array of the result. You can store this relation ship statically or ironically in a database.(It's the kind of trade of of caching you have to use more computer power so you can use less computer power sometimes)

Keeping track of every time the query is run the resulting hash if the result hash is different the new hash is updated. If the hash is the same then it adds to the number of duplicate results. If the desired number of repeat results come up then you cache the results and stop checking the table for an allotted amount of time and or subsequent runs of the query.

You would have a class that controlled all of this going ons. Functions could include things like

-start cache checking
-set threshold
-cache always
-cache time life
-force clear all cache
-clear this cache for this query
-we have been death hit with the death laser and need to catch everything(The I hate you wordpress I'm never using you again function I shouldn't have been so lazy and made my own website function)

This would help to automate much of your process. Also cache rules can be implemented on a model by model basis or to the entire application as a whole.

This might be slightly more overhead then some cache systems but if you just want to have caching doing its own thing I think it would work well; with out it running to much amok.

九公里浅绿 2024-09-09 22:43:47

这并不是真正的答案,但你的问题提醒我我见过 这一章,我认为,它描述了如何使用 Doctrine ORM 和 Symfony 来做你想做的事情。您可能想与该方法/实现进行比较。

基本上,那里的方法不会尝试“天文数字般的智能”,而是允许程序员根据数据的波动性及其性能影响手动指定要缓存的结果集......我想您可以近似该决定并每晚重新计算基于实际指标或其他东西。

This isn't really an answer, but your question reminded me I had seen this chapter which describes, I think, how to do what you want to do using the Doctrine ORM with Symfony. You might want to compare with that approach/implementation.

Basically, the approach there doesn't try for "astronomically smart" but allows the programmer to manually specify result sets to cache based on the volatility of the data and its performance impact... I suppose you could approximate that decision and recalculate it nightly based on actual metrics or something.

清晨说晚安 2024-09-09 22:43:47

我建议您查看此处,全面了解 ORM 中的缓存,包括问题以及可以应用的解决方案。

在 ORM 中处理缓存数据时,通常需要解决以下 3 个问题:

  1. 许多 ORM 实现将数据库资源或不可序列化的结果集或两者都存储在实际的 ORM 对象中。由于缓存要求所有对象都被序列化,这给我们带来了严重的障碍。
  2. 如何跟踪缓存中的一组数据与另一组数据?
  3. 如何通知缓存特定数据集已更改?

I would recommend that you look here for a comprehensive look at caching in ORM's including the problems and solutions that can be applied.

When dealing with caching data in an ORM, you generally have the following 3 problems to solve:

  1. Many ORM implementations store either the database resource or a non-serializable result set or both in the actual ORM objects. Since caching requires that all objects be serialized, this puts a serious road block in our way.
  2. How do you track one set of data versus another in the cache?
  3. How do you notify the cache that a particular data set has changed?
忆依然 2024-09-09 22:43:47

您应该有一个单独的模型来直接执行 SQL 接口,例如。对于客户表:$CustomerModel->GetCustomers($parameter); 等等。然后,在这些模型中,您可以透明地实现缓存,而无需编辑任何现有的 MVC。

You should have a seperate Model which does the SQL interfacing directly, eg. for a Customers table: $CustomerModel->GetCustomers($parameter); et cetera. Then, in those models, you can implement caching transparently without having to edit any of your existing MVCs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文