建筑“孤立”和“自动更新” Java 中的缓存 (java.util.List)

发布于 2024-08-28 13:08:26 字数 1003 浏览 15 评论 0原文

对于任何感兴趣的人:我已经实现了我正在寻找的行为的代码,并在谷歌代码上将其开源。在这里得到它! pojo-mvcc

--

大家好,

我正在尝试编写一个框架,其中包含许多短期缓存是从长期缓存创建的。这些短期缓存需要能够返回其完整内容,这是原始长期缓存的克隆。

实际上,我正在尝试为短期缓存构建一定程度的事务隔离。用户应该能够修改短期缓存的内容,但不应传播对长期缓存的更改(也存在应推送更改的情况,具体取决于缓存类型)。

我会尽力尝试解释:

主缓存包含:[A,B,C,D,E,F] 使用状态 [A,B,C,D,E,F] 创建临时缓存

1) 临时缓存添加项目 G: [A,B,C,D,E,F] 2) 临时缓存删除项目 B: [A,C,D,E,F]

主缓存包含:[A,B,C,D,E,F]

3) 主缓存添加项目 [X,Y, Z]: [A,B,C,D,E,F,X,Y,Z]

临时缓存包含:[A,C,D,E,F]

当项目中的值可以更改时,事情会变得更加困难并且不应该总是更新(所以我什至不能共享底层对象实例,我需要使用克隆)。

我已经实现了一种简单的方法,即使用 ArrayList 上的标准 Collection 构造函数创建一个新的 List 实例,但是当您达到大约 200,000 个项目时,系统就会耗尽内存。我知道 200,000 的值对于迭代来说太大了,但我试图稍微强调一下我的代码。

我原以为它可能能够以某种方式“代理”列表,因此临时缓存使用主缓存,并存储所有更改(实际上是更改的备忘录),但是当您想要迭代临时缓存,或检索特定索引处的项目。另外,考虑到我希望对列表的内容进行一些修改(取决于临时缓存的类型,无论它是否是“自动更新”),我完全超出了我的理解范围。

任何指向技术或数据结构或只是尝试和研究的一般概念的指示都将不胜感激。

干杯,

艾多斯

FOR ANYONE INTERESTED: I have implemented the code for the behaviour I am looking for and open-sourced it on google-code. Get it here! pojo-mvcc

--

Hi Guys,

I am trying to write a framework which contains a lot of short-lived caches created from a long-living cache. These short-lived caches need to be able to return their entier contents, which is a clone from the original long-living cache.

Effectively what I am trying to build is a level of transaction isolation for the short-lived caches. The user should be able to modify the contents of the short-lived cache, but changes to the long-living cache should not be propogated through (there is also a case where the changes should be pushed through, depending on the Cache type).

I will do my best to try and explain:

master-cache contains: [A,B,C,D,E,F]
temporary-cache created with state [A,B,C,D,E,F]

1) temporary-cache adds item G: [A,B,C,D,E,F]
2) temporary-cache removes item B: [A,C,D,E,F]

master-cache contains: [A,B,C,D,E,F]

3) master-cache adds items [X,Y,Z]: [A,B,C,D,E,F,X,Y,Z]

temporary-cache contains: [A,C,D,E,F]

Things get even harder when the values in the items can change and shouldn't always be updated (so I can't even share the underlying object instances, I need to use clones).

I have implemented the simple approach of just creating a new instance of the List using the standard Collection constructor on ArrayList, however when you get out to about 200,000 items the system just runs out of memory. I know the value of 200,000 is excessive to iterate, but I am trying to stress my code a bit.

I had thought that it might be able to somehow "proxy" the list, so the temporary-cache uses the master-cache, and stores all of it's changes (effectively a Memento for the change), however that quickly becomes a nightmare when you want to iterate the temporary-cache, or retrieve an item at a specific index. Also given that I want some modifications to the contents of the list to come through (depending on the type of the temporary-cache, whether it is "auto-update" or not) and I get completly out of my depth.

Any pointers to techniques or data-structures or just general concepts to try and research will be greatly appreciated.

Cheers,

Aidos

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

熊抱啵儿 2024-09-04 13:08:26

这就是您想要做的。类似于所谓的 MVCC(多版本货币控制)。

简而言之,您需要将事务 ID 与缓存元素相关联。

因此,缓存条目将如下所示:

public class CacheEntry {
    long transactionId;
    boolean deleted;
    Object value;
}

您的缓存条目以相反的 transactionid 顺序存储在列表中。

当您获取缓存元素时,您会查找列表(在哈希映射中)。然后,您搜索具有小于或等于您的交易的交易 ID 的最高交易 ID 的值。

因此,让我们考虑一下 DELETE 问题。

您有事务 10,正在寻找“ABC”。我们假设 ABC 已经在缓存中,并且是由事务 5 放入的。

因此,T10 获取 ABC 的条目列表,向下搜索列表并发现在 T5 处有值“123”。 T5是小于或等于T10的最高交易。 T10 将 ABC 的值从 123 更改为 456。

现在 T12 出现,它会查找 ABC。它会从 T10 中找到该值为 456。 T12决定删除ABC,因此T12的“已删除”缓存条目被放置在缓存条目列表中。如果T10再次尝试查找ABC,它将找到456,因为12>4。 10,而最高的交易<=10是T10,所以它看不到删除。

T14 出现,搜索 ABC,找不到它(因为它已“删除”),并插入一个新值 789。如果 T12 查找,它仍然会被删除,如果 T10 查找,它仍然是 456。

所以,最后,您的缓存列表如下所示:

{tid: 14 deleted: false value: 789}
{tid: 12 deleted: true  value: nul}
{tid: 10 deleted: false value: 456}
{tid:  5 deleted: false value: 123}

您遇到的下一个问题是处理打开事务之间的可见性。即另一个事务可以看到来自另一个未提交的打开事务的数据。但这并不太难,因为它只是在扫描版本列表以寻找合适的候选者时调整标准。您还可以保留事务 ID 及其状态(打开、已提交、回滚)的列表。

最后你必须想出一个机制来清理未解决的问题。提交两个事务后,如果没有其他打开的事务,则可以删除较旧的记录。

例如,如果您拥有来自 T5 和 T10 的数据,如果两者都已提交,则没有人能够再次“看到”T5 的数据,因为 T10 的数据现在是“当前”状态。所以,T5行可以删除。

这可能最好通过简单地迭代缓存并删除过时的事务条目来完成。

这就是要点,显然细节决定成败。

Here's what you want to do. Something akin to what's known as MVCC, Multi Version Currency Control.

Simply put you need to associate a transaction ID to your cache elements.

So a cache entry will look something like this:

public class CacheEntry {
    long transactionId;
    boolean deleted;
    Object value;
}

Your cache entries are stored in a list, in reverse transactionid order.

When you go to get a cache element, you look up the list (in your hash map). Then you search for the value that has the highest transaction ID that is LESS THAN or EQUAL to the transaction ID of YOUR transaction.

So, let's take the DELETE issue in to account.

You have Transaction 10, looking for "ABC". We'll assume ABC is already in the cache, and it was put in by Transaction 5.

So, T10 gets the list of entries for ABC, searches down the list and finds that at T5, there's the value "123". T5 is the highest transaction less than or equal to T10. T10 changes the value for ABC from 123 to 456.

Now T12 comes along and it looks for ABC. It will find the value to be 456 from T10. T12 decides to delete ABC, and so a "deleted" cache entry for T12 is placed in the cache entry list. If T10 tries to look up ABC again, it will find 456, because 12 > 10, and the highest transaction <= 10 is T10, so it does not see the deletion.

T14 comes along, searches for ABC, CAN'T find it (because it's "deleted"), and inserts a new value of 789. If T12 were to look it would still be deleted, if T10 were, it's still 456.

So, in the end, you cache list looks like this:

{tid: 14 deleted: false value: 789}
{tid: 12 deleted: true  value: nul}
{tid: 10 deleted: false value: 456}
{tid:  5 deleted: false value: 123}

The next problem you have is dealing with visibility across open transactions. i.e. can another transaction see data from another open transaction that's not committed. But that's not too hard, as it just adjusts the criteria when scanning the list of versions for an appropriate candidate. And you can keep a list of transaction IDs, with their status (open, committed, rolledback).

Finally you have to come up with a mechanism to clean up the loose ends. After you commit two transaction, with no other open transactions, the older records can be removed.

For example, if you have the data from T5 and T10, if both of them are committed, no one will ever be able to "see" T5's data again as T10's is now the "current" state. So, the T5 row can be deleted.

This is probably best done by simply iterating over the cache and removing obsolete transaction entries.

That's the gist of it, obviously the devil is in the details.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文