NHibernate 缓存困境

发布于 2024-08-23 21:46:21 字数 2153 浏览 13 评论 0原文

我的应用程序包括客户端、Web 层(负载平衡)、应用程序层(负载平衡)和数据库层。 Web 层向客户端公开服务,并将调用转发到应用程序层。然后应用程序层对数据库执行查询(使用 NHibernate)并返回结果。

数据主要是读取,但写入相当频繁,特别是当新数据进入系统时。通常情况下,数据会被聚合,并且这些聚合会返回给客户端,而不是原始数据。

通常,用户会对最近数据的汇总感兴趣 - 例如,过去一周的数据。因此,对我来说,引入一个包含过去 7 天的所有数据的缓存是有意义的。我不能只在实体加载时对其进行缓存,因为我需要聚合范围的实体,而该范围是由客户端以及其他复杂性(例如过滤器)决定的。我需要知道在给定的时间范围内该范围内的所有数据是否都在缓存中。

在我理想的幻想世界中,我的服务根本不需要改变:

public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
    // execute HQL/criteria call and have it automatically use the cache where possible
}

将有一个单独的过滤层,它会挂接到 NHibernate 中,并智能且透明地确定是否可以针对缓存执行 HQL/条件查询,并且将仅在必要时才访问数据库。如果所有数据都在缓存中,它将查询缓存的数据本身,有点像内存数据库。

然而,乍一看,NHibernate 的二级缓存机制似乎不适合我的需求。我希望能够做的是:

  1. 将其配置为始终在缓存中保留最近 7 天的数据。例如。 “对于这个表,缓存该字段在 7 天前到现在的所有记录。”
  2. 具有手动维护缓存的能力。当新数据进入系统时,如果我可以将其直接放入缓存而不是等到缓存失效,那就太好了。同样,当数据超出该时间段时,我希望能够从缓存中提取它。
  3. 让 NHibernate 智能地了解何时可以直接从缓存提供查询而不是访问数据库。例如。如果用户请求过去 3 天的数据聚合,则应直接从缓存计算该聚合,而不是接触数据库。

现在,我很确定#3 要求太多了。即使我可以使用所需的所有数据填充缓存,NHibernate 也不知道如何有效地查询该数据。它实际上必须循环所有实体,以便区分哪些实体与查询相关(老实说,这可能没问题)。此外,它还需要 NHibernate 查询引擎的实现,该引擎针对对象而不是数据库执行。但我可以做梦,对吗?

假设 #3 要求太多,我会在我的服务中需要一些逻辑,如下所示:

public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
    if (CanBeServicedFromCache(starting, ending, filter))
    {
        // execute some LINQ to object code or whatever to determine the aggregation results
    }
    else
    {
        // execute HQL/criteria call to determine the aggregation results
    }
}

这并不理想,因为每个服务都必须具有缓存感知能力,并且必须复制聚合逻辑:一次用于通过 NHibernate 查询数据库,一次用于通过 NHibernate 查询数据库用于查询缓存。

也就是说,如果我至少能够将相关数据存储在 NHibernate 的二级缓存中,那就太好了。这样做将允许其他服务(不进行聚合)透明地从缓存中受益。如果我决定系统中的其他地方需要二级缓存,它还可以确保我不会在缓存实体上加倍(一次在二级缓存中,一次在我自己的单独缓存中)。

我怀疑如果我可以在运行时掌握 ICache 的实现,我所需要做的就是调用 Put() 方法将我的数据粘贴到缓存中。但这可能会踏入危险的境地......

任何人都可以提供关于 NHibernate 的二级缓存机制是否可以满足我的任何要求的任何见解吗?或者我应该推出自己的解决方案并完全放弃 NHibernate 的二级缓存?

谢谢

。我已经考虑使用多维数据集来更快地进行聚合计算,但这仍然让我将数据库作为瓶颈。除了缓存之外,我很可能会使用多维数据集,但缺少缓存是我现在最关心的问题。

My application includes a client, web tier (load balanced), application tier (load balanced), and database tier. The web tier exposes services to clients, and forwards calls onto the application tier. The application tier then executes queries against the database (using NHibernate) and returns the results.

Data is mostly read, but writes occur fairly frequently, particularly as new data enters the system. Much more often than not, data is aggregated and those aggregations are returned to the client - not the original data.

Typically, users will be interested in the aggregation of recent data - say, from the past week. Thus, to me it makes sense to introduce a cache that includes all data from the past 7 days. I cannot just cache entities as and when they are loaded because I need to aggregate over a range of entities, and that range is dictated by the client, along with other complications, such as filters. I need to know whether - for a given range of time - all data within that range is in the cache or not.

In my ideal fantasy world, my services would not have to change at all:

public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
    // execute HQL/criteria call and have it automatically use the cache where possible
}

There would be a separate filtering layer that would hook into NHibernate and intelligently and transparently determine whether the HQL/criteria query could be executed against the cache or not, and would only go to the database if necessary. If all the data was in the cache, it would query the cached data itself, kind of like an in-memory database.

However, on first inspection, NHibernate's second level cache mechanism does not seem appropriate for my needs. What I'd like to be able to do is:

  1. Configure it to always have the last 7 days worth of data in the cache. eg. "For this table, cache all records where this field is between 7 days ago and now."
  2. Have the ability to manually maintain the cache. As new data enters the system, it would be nice if I could just throw it straight into the cache rather than waiting until the cache is invalidated. Similarly, as data falls out of the time period, I'd like to be able to pull it from the cache.
  3. Have NHibernate intelligently understand when it can serve a query directly from the cache rather than hitting the database at all. eg. If the user asks for an aggregate of data over the past 3 days, that aggregation should be calculated directly from the cache rather than touching the DB.

Now, I'm pretty sure #3 is asking too much. Even if I can get the cache populated with all the data required, NHibernate has no idea how to efficiently query that data. It would literally have to loop over all entities in order to discriminate which are relevant to the query (which might be fine, to be honest). Also, it would require an implementation of NHibernate's query engine that executed against objects rather than a database. But I can dream, right?

Assuming #3 is asking too much, I would require some logic in my services like this:

public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
    if (CanBeServicedFromCache(starting, ending, filter))
    {
        // execute some LINQ to object code or whatever to determine the aggregation results
    }
    else
    {
        // execute HQL/criteria call to determine the aggregation results
    }
}

This isn't ideal because each service must be cache-aware, and must duplicate the aggregation logic: once for querying the database via NHibernate, and once for querying the cache.

That said, it would be nice if I could at least store the relevant data in NHibernate's second level cache. Doing so would allow other services (that don't do aggregation) to transparently benefit from the cache. It would also ensure that I'm not doubling up on cached entities (once in the second level cache, and once in my own separate cache) if I ever decide the second level cache is required elsewhere in the system.

I suspect if I can get a hold of the implementation of ICache at runtime, all I would need to do is call the Put() method to stick my data into the cache. But this might be treading on dangerous ground...

Can anyone provide any insight as to whether any of my requirements can be met by NHibernate's second level cache mechanism? Or should I just roll my own solution and forgo NHibernate's second level cache altogether?

Thanks

PS. I've already considered a cube to do the aggregation calculations much more quickly, but that still leaves me with the database as the bottleneck. I may well use a cube in addition to the cache, but the lack of a cache is my primary concern right now.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

笑红尘 2024-08-30 21:46:21

停止使用事务 (OLTP) 数据源进行分析 (OLAP) 查询,问题就会消失。

当发生域重大事件时(例如,新实体进入系统或更新),触发事件(la 域事件)。为事件连接一个处理程序,该处理程序获取创建或更新的实体的详细信息,并将数据存储在非规范化的报告存储中,该存储专门设计用于允许报告您想要的聚合(最有可能将数据推送到星型模式中)。现在,您的报告只是沿着预定义轴查询聚合(甚至可以预先计算),只需要简单的选择和一些连接即可。可以使用 L2SQL 之类的工具甚至简单的参数化查询和数据读取器来执行查询。

性能提升应该是显着的,因为您可以优化读取端以跨多个标准进行快速查找,同时优化写入端以通过 id 进行快速查找并减少写入时的索引负载。

还可以获得额外的性能和可扩展性,因为一旦迁移到这种方法,您就可以在物理上分离读取和写入存储,这样您就可以为每个写入存储运行 n 个读取存储,从而允许您的解决方案横向扩展以满足增加的读取需求而写入需求以较低的速度增加。

Stop using your transactional ( OLTP ) datasource for analytical ( OLAP ) queries and the problem goes away.

When a domain significant event occurs (eg a new entity enters the system or is updated), fire an event ( a la domain events ). Wire up a handler for the event that takes the details of the created or updated entity and stores the data in a denormalised reporting store specifically designed to allow reporting of the aggregates you desire ( most likely push the data into a star schema ). Now your reporting is simply the querying of aggregates ( which may even be precalculated ) along predefined axes requiring nothing more than a simple select and a few joins. Querying can be carried out using something like L2SQL or even simple parameterised queries and datareaders.

Performance gains should be significant as you can optimise the read side for fast lookups across many criteria while optimising the write side for fast lookups by id and reduced index load on write.

Additional performance and scalability is also gained as once you have migrated to this approach, you can then physically separate your read and write stores such that you can run n read stores for every write store thereby allowing your solution to scale out to meet increased read demands while write demands increase at a lower rate.

孤云独去闲 2024-08-30 21:46:21

定义 2 个具有较长到期时间的缓存区域“aggregation”和“aggregation.today”。分别将它们用于前几天和今天的聚合查询。

DoIt() 中,使用可缓存查询每天在请求的范围内进行 1 次 NH 查询。在 C# 中合并查询结果。

使用后台进程启动缓存,该进程定期调用 DoIt() 并指定需要缓存的日期范围。此过程的频率必须低于聚合缓存区域的到期时间。

当今天的数据发生变化时,清除缓存区域“aggregation.today”。如果您想快速重新加载此缓存区域,请立即执行此操作,或者使用另一个更频繁的后台进程来调用 DoIt() 今天。

当您启用查询缓存时,NHibernate 将从缓存中提取结果(如果可能)。这基于查询和参数值。

Define 2 cache regions "aggregation" and "aggregation.today" with a large expiry time. Use these for your aggregation queries for previous days and today respectively.

In DoIt(), make 1 NH query per day in the requested range using cacheable queries. Combine the query results in C#.

Prime the cache with a background process which calls DoIt() periodically with the date range that you need to be cached. The frequency of this process must be lower than the expiry time of the aggregation cache regions.

When today's data changes, clear cache region "aggregation.today". If you want to reload this cache region quickly, either do so immediately or have another more frequent background process which calls DoIt() for today.

When you have query caching enabled, NHibernate will pull the results from cache if possible. This is based on the query and parameters values.

风和你 2024-08-30 21:46:21

在分析 NHibernate 缓存详细信息时,我记得读过一些不应依赖缓存的内容,女巫似乎是一个很好的建议。

我认为滚动您自己的数据/缓存管理策略可能更合理,而不是尝试让您的 O/R Mapper 满足您的应用程序需求。

另外,你所说的 7 天缓存规则听起来像是与业务相关的东西,但 O/R 映射器不应该知道这一点。

总之,让您的应用程序在没有任何缓存的情况下工作,而不是使用探查器(或更多 - .net、sql、nhibernate探查器)来查看瓶颈在哪里,并通过最终添加缓存或任何其他优化来开始改进“红色”部分。

PS:关于一般的缓存 - 根据我的经验,一个缓存点很好,两个缓存处于灰色区域,你应该有充分的理由进行分离,超过两个缓存点就是自找麻烦。

希望有帮助

When analyzing the NHibernate cache details i remember reading something that you should not relay on the cache being there, witch seems a good suggestion.

Instead of trying to make your O/R Mapper cover your applications needs i think rolling your own data/cache management strategy might be more reasonable.

Also the 7 days caching rule you talk about sounds like something business related, witch is something the O/R mapper should not know about.

In conclusion make your app work without any caching at all, than use a profiler (or more - .net,sql,nhibernate profiler ) to see where the bottlenecks are and start improving the "red" parts by eventually adding caching or any other optimizations.

PS: about caching in general - in my experience one caching point is fine, two caches is in the gray zone and you should have a strong reason for the separation and more than two is asking for trouble.

hope it helps

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文