NHibernate 缓存困境

发布于 2024-08-23 21:46:21 字数 2153 浏览 13 评论 0原文

我的应用程序包括客户端、Web 层（负载平衡）、应用程序层（负载平衡）和数据库层。 Web 层向客户端公开服务，并将调用转发到应用程序层。然后应用程序层对数据库执行查询（使用 NHibernate）并返回结果。

数据主要是读取，但写入相当频繁，特别是当新数据进入系统时。通常情况下，数据会被聚合，并且这些聚合会返回给客户端，而不是原始数据。

通常，用户会对最近数据的汇总感兴趣 - 例如，过去一周的数据。因此，对我来说，引入一个包含过去 7 天的所有数据的缓存是有意义的。我不能只在实体加载时对其进行缓存，因为我需要聚合范围的实体，而该范围是由客户端以及其他复杂性（例如过滤器）决定的。我需要知道在给定的时间范围内该范围内的所有数据是否都在缓存中。

在我理想的幻想世界中，我的服务根本不需要改变：

public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
    // execute HQL/criteria call and have it automatically use the cache where possible
}

将有一个单独的过滤层，它会挂接到 NHibernate 中，并智能且透明地确定是否可以针对缓存执行 HQL/条件查询，并且将仅在必要时才访问数据库。如果所有数据都在缓存中，它将查询缓存的数据本身，有点像内存数据库。

然而，乍一看，NHibernate 的二级缓存机制似乎不适合我的需求。我希望能够做的是：

将其配置为始终在缓存中保留最近 7 天的数据。例如。 “对于这个表，缓存该字段在 7 天前到现在的所有记录。”
具有手动维护缓存的能力。当新数据进入系统时，如果我可以将其直接放入缓存而不是等到缓存失效，那就太好了。同样，当数据超出该时间段时，我希望能够从缓存中提取它。
让 NHibernate 智能地了解何时可以直接从缓存提供查询而不是访问数据库。例如。如果用户请求过去 3 天的数据聚合，则应直接从缓存计算该聚合，而不是接触数据库。

现在，我很确定#3 要求太多了。即使我可以使用所需的所有数据填充缓存，NHibernate 也不知道如何有效地查询该数据。它实际上必须循环所有实体，以便区分哪些实体与查询相关（老实说，这可能没问题）。此外，它还需要 NHibernate 查询引擎的实现，该引擎针对对象而不是数据库执行。但我可以做梦，对吗？

假设 #3 要求太多，我会在我的服务中需要一些逻辑，如下所示：

public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
    if (CanBeServicedFromCache(starting, ending, filter))
    {
        // execute some LINQ to object code or whatever to determine the aggregation results
    }
    else
    {
        // execute HQL/criteria call to determine the aggregation results
    }
}

这并不理想，因为每个服务都必须具有缓存感知能力，并且必须复制聚合逻辑：一次用于通过 NHibernate 查询数据库，一次用于通过 NHibernate 查询数据库用于查询缓存。

也就是说，如果我至少能够将相关数据存储在 NHibernate 的二级缓存中，那就太好了。这样做将允许其他服务（不进行聚合）透明地从缓存中受益。如果我决定系统中的其他地方需要二级缓存，它还可以确保我不会在缓存实体上加倍（一次在二级缓存中，一次在我自己的单独缓存中）。

我怀疑如果我可以在运行时掌握 ICache 的实现，我所需要做的就是调用 Put() 方法将我的数据粘贴到缓存中。但这可能会踏入危险的境地......

任何人都可以提供关于 NHibernate 的二级缓存机制是否可以满足我的任何要求的任何见解吗？或者我应该推出自己的解决方案并完全放弃 NHibernate 的二级缓存？

谢谢

。我已经考虑使用多维数据集来更快地进行聚合计算，但这仍然让我将数据库作为瓶颈。除了缓存之外，我很可能会使用多维数据集，但缺少缓存是我现在最关心的问题。

原文

My application includes a client, web tier (load balanced), application tier (load balanced), and database tier. The web tier exposes services to clients, and forwards calls onto the application tier. The application tier then executes queries against the database (using NHibernate) and returns the results.

Data is mostly read, but writes occur fairly frequently, particularly as new data enters the system. Much more often than not, data is aggregated and those aggregations are returned to the client - not the original data.

Typically, users will be interested in the aggregation of recent data - say, from the past week. Thus, to me it makes sense to introduce a cache that includes all data from the past 7 days. I cannot just cache entities as and when they are loaded because I need to aggregate over a range of entities, and that range is dictated by the client, along with other complications, such as filters. I need to know whether - for a given range of time - all data within that range is in the cache or not.

In my ideal fantasy world, my services would not have to change at all:

public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
    // execute HQL/criteria call and have it automatically use the cache where possible
}

There would be a separate filtering layer that would hook into NHibernate and intelligently and transparently determine whether the HQL/criteria query could be executed against the cache or not, and would only go to the database if necessary. If all the data was in the cache, it would query the cached data itself, kind of like an in-memory database.

However, on first inspection, NHibernate's second level cache mechanism does not seem appropriate for my needs. What I'd like to be able to do is:

Configure it to always have the last 7 days worth of data in the cache. eg. "For this table, cache all records where this field is between 7 days ago and now."
Have the ability to manually maintain the cache. As new data enters the system, it would be nice if I could just throw it straight into the cache rather than waiting until the cache is invalidated. Similarly, as data falls out of the time period, I'd like to be able to pull it from the cache.
Have NHibernate intelligently understand when it can serve a query directly from the cache rather than hitting the database at all. eg. If the user asks for an aggregate of data over the past 3 days, that aggregation should be calculated directly from the cache rather than touching the DB.

Now, I'm pretty sure #3 is asking too much. Even if I can get the cache populated with all the data required, NHibernate has no idea how to efficiently query that data. It would literally have to loop over all entities in order to discriminate which are relevant to the query (which might be fine, to be honest). Also, it would require an implementation of NHibernate's query engine that executed against objects rather than a database. But I can dream, right?

Assuming #3 is asking too much, I would require some logic in my services like this:

public AggregationResults DoIt(DateTime starting, DateTime ending, Filter filter)
{
    if (CanBeServicedFromCache(starting, ending, filter))
    {
        // execute some LINQ to object code or whatever to determine the aggregation results
    }
    else
    {
        // execute HQL/criteria call to determine the aggregation results
    }
}

This isn't ideal because each service must be cache-aware, and must duplicate the aggregation logic: once for querying the database via NHibernate, and once for querying the cache.

That said, it would be nice if I could at least store the relevant data in NHibernate's second level cache. Doing so would allow other services (that don't do aggregation) to transparently benefit from the cache. It would also ensure that I'm not doubling up on cached entities (once in the second level cache, and once in my own separate cache) if I ever decide the second level cache is required elsewhere in the system.

I suspect if I can get a hold of the implementation of ICache at runtime, all I would need to do is call the Put() method to stick my data into the cache. But this might be treading on dangerous ground...

Can anyone provide any insight as to whether any of my requirements can be met by NHibernate's second level cache mechanism? Or should I just roll my own solution and forgo NHibernate's second level cache altogether?

Thanks

PS. I've already considered a cube to do the aggregation calculations much more quickly, but that still leaves me with the database as the bottleneck. I may well use a cube in addition to the cache, but the lack of a cache is my primary concern right now.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑红尘 2024-08-30 21:46:21

停止使用事务 (OLTP) 数据源进行分析 (OLAP) 查询，问题就会消失。

当发生域重大事件时（例如，新实体进入系统或更新），触发事件（la 域事件）。为事件连接一个处理程序，该处理程序获取创建或更新的实体的详细信息，并将数据存储在非规范化的报告存储中，该存储专门设计用于允许报告您想要的聚合（最有可能将数据推送到星型模式中）。现在，您的报告只是沿着预定义轴查询聚合（甚至可以预先计算），只需要简单的选择和一些连接即可。可以使用 L2SQL 之类的工具甚至简单的参数化查询和数据读取器来执行查询。

性能提升应该是显着的，因为您可以优化读取端以跨多个标准进行快速查找，同时优化写入端以通过 id 进行快速查找并减少写入时的索引负载。

还可以获得额外的性能和可扩展性，因为一旦迁移到这种方法，您就可以在物理上分离读取和写入存储，这样您就可以为每个写入存储运行 n 个读取存储，从而允许您的解决方案横向扩展以满足增加的读取需求而写入需求以较低的速度增加。

回复收藏 0 原文