具有 APC 缓存的密钥组

发布于 2025-01-02 19:48:32 字数 1080 浏览 2 评论 0原文

APC 允许您将数据存储在键内，但您无法对这些键进行分组。

因此，如果我想要一个名为“文章”的组，并且在该组内我将拥有采用文章 ID 形式的密钥，我无法轻松做到这一点。

articles -> 5   -> cached data
         -> 10  -> cached data
         -> 17  -> cached data

         ...

我可以在键前面加上“组”名称前缀，例如：

article_5   -> cached data
article_10  -> cached data
article_17  -> cached data

 ...

但这使得如果我想删除整个组就不可能了:(

一个可行的解决方案是存储多维数组（这就是我现在正在做的），但我认为这不好，因为当我想要访问/或删除缓存数据时，我需要首先获取整个组，因此，如果该组中有无数文章，您可以想象我将迭代什么样的数组。并搜索

你有更好的想法吗我怎样才能实现团体目标？

edit: found another solution, not sure if it's much better because I don't know how reliable is yet. I'm adding a special key called __paths which is basically a multidimensional array containing the full prefixed key paths for all the other entries in the cache. And when I request or delete the cache I use this array as a reference to quickly find out the key (or group of keys) I need to remove, so I don't have to store arrays and iterate trough all keys...

原文

APC lets you store data inside keys, but you cannot group these keys.

So if i want to have a group called "articles", and inside this group I would have keys that take the form of the article ID I can't do this easily.

articles -> 5   -> cached data
         -> 10  -> cached data
         -> 17  -> cached data

         ...

I could prefix the key with the "group" name like:

article_5   -> cached data
article_10  -> cached data
article_17  -> cached data

 ...

But this it makes it impossible to delete the entire group if I want to :(

A working solution would be to store multidimensional arrays (this is what I'm doing now), but I don't think it's good because when I want to access / or delete cached data, I need to get the entire group first. So if the group has one zillion articles in it you can image what kind of array I will be iterating and searching

Do you have better ideas on how could I achieve the group thing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝天白云 2025-01-09 19:48:32

根据您的观察，我研究了 APC 缓存模型的底层 C 实现 (apc_cache.c）看看我能找到什么。

来源证实了您的观察结果，即后备数据存储中不存在分组结构，因此任何松散分组的对象集合都需要基于某些命名空间约束或对缓存层本身的修改来完成。我希望通过链接列表找到一些依赖于密钥链的后门，但不幸的是，似乎冲突是通过直接重新分配冲突槽而不是链接。

更令人困惑的是，APC 似乎对用户条目使用了显式缓存模型，以防止它们老化。因此，Emil Vikström 提供的解决方案依赖于LRU 模型将无法工作。

在不修改 APC 本身的源代码的情况下，我会这样做：

定义您的条目符合的名称空间约束。正如您在上面最初定义的那样，这将类似于在每个条目前面添加 article_ 。
在此集合中定义一个单独元素列表。实际上，这将是您上面描述的 5、10 和 17 方案，但在这种情况下，您可以使用一些数字类型以使其比存储大量字符串值更有效。
定义一个接口来更新这组指针并将它们与后备内存缓存协调起来，包括（至少）方法insert、delete 和<代码>清除。当调用 clear 时，遍历每个指针，重建您在支持数据存储中使用的密钥，并从缓存中刷新每个指针。

我在这里提倡的是一个定义明确的对象，它可以高效地执行您所寻求的操作。这与子缓存中的条目数成线性比例，但由于您对每个元素使用数字类型，因此在开始经历真正的内存限制之前，您需要超过 1 亿个条目左右，例如，几百兆字节。

Tamas Imrei 比我先于提出替代策略，我已经在记录过程中了，但这有一些重大缺陷，我'我想讨论一下。

正如支持 C 代码中所定义的， APCIterator是执行搜索时对整个数据集的线性时间操作（使用其构造函数，public __construct ( string $cache [, mix $search = null ...]] ））。

如果您正在搜索的支持元素仅占总数据的一小部分，那么这是完全不希望的，因为它会遍历缓存中的每个元素来查找您想要的元素。引用 apc_cache.c：

/* {{{ apc_cache_user_find */
apc_cache_entry_t* apc_cache_user_find(apc_cache_t* cache, char *strkey, \
  int keylen, time_t t TSRMLS_DC)
{
    slot_t** slot;
    ...
    slot = &cache->slots[h % cache->num_slots];
    while (*slot) {
        ...
        slot = &(*slot)->next;
    }
}

因此，我强烈建议使用一种高效的、基于指针的虚拟分组解决方案来解决您的问题，正如我上面概述的那样。尽管如此，在内存严重受限的情况下，迭代器方法可能是最正确的，以牺牲计算量来节省尽可能多的内存。

祝您申请顺利。

Based upon your observations, I looked at the underlying C implementation of APC's caching model (apc_cache.c) to see what I could find.

The source corroborates your observations that no grouping structure exists in the backing data store, such that any loosely-grouped collection of objects will need to be done based on some namespace constraint or a modification to the cache layer itself. I'd hoped to find some backdoor relying on key chaining by way of a linked list, but unfortunately it seems collisions are reconciled by way of a direct reallocation of the colliding slot instead of chaining.

Further confounding this problem, APC appears to use an explicit cache model for user entries, preventing them from aging off. So, the solution Emil Vikström provided that relies on the LRU model of memcached will, unfortunately, not work.

Without modifying the source code of APC itself, here's what I would do:

Define a namespace constraint that your entries conform to. As you've originally defined above, this would be something like article_ prepended to each of your entries.
Define a separate list of elements in this set. Effectively, this would be the 5, 10, and 17 scheme you'd described above, but in this case, you could use some numeric type to make this more efficient than storing a whole lot of string values.
Define an interface to updating this set of pointers and reconciling them with the backing memory cache, including (at minimum) the methods insert, delete, and clear. When clear is called, walk each of your pointers, reconstruct the key you used in the backing data store, and flush each from your cache.

What I'm advocating for here is a well-defined object that performs the operations you seek efficiently. This scales linearly with the number of entries in your sub-cache, but because you're using a numeric type for each element, you'd need over 100 million entries or so before you started to experience real memory pain at a constraint of, for example, a few hundred megabytes.

Tamas Imrei beat me to suggesting an alternate strategy I was already in the process of documenting, but this has some major flaws I'd like to discuss.

As defined in the backing C code, APCIterator is a linear time operation over the full data set when performing searches (using its constructor, public __construct ( string $cache [, mixed $search = null ...]] )).

This is flatly undesirable in the case where the backing elements you're searching for represent a small percentage of your total data, because it would walk every single element in your cache to find the ones you desire. Citing apc_cache.c:

/* {{{ apc_cache_user_find */
apc_cache_entry_t* apc_cache_user_find(apc_cache_t* cache, char *strkey, \
  int keylen, time_t t TSRMLS_DC)
{
    slot_t** slot;
    ...
    slot = &cache->slots[h % cache->num_slots];
    while (*slot) {
        ...
        slot = &(*slot)->next;
    }
}

Therefore, I would most strongly recommend using an efficient, pointer-based virtual grouping solution to your problem as I've sketched out above. Although, in the case where you're severely memory-restricted, the iterator approach may be most correct to conserve as much memory as possible at the expense of computation.

Best of luck with your application.

回复收藏 0 原文

恬淡成诗 2025-01-09 19:48:32

我曾经在使用 memcached 时遇到过这个问题，我通过在密钥中使用版本号解决了这个问题，如下所示：

version -> 5
article_5_5 -> cached data
article_10_5 -> cached data
article_17_5 -> cached data

只需更改版本号，该组就会有效地“消失”！

memcached 使用最近最少使用的策略来删除旧数据，因此当需要空间时，旧版本组将从缓存中删除。 我不知道APC是否有同样的功能。

根据 MrGomez 的说法，这不适用于 APC。请阅读他的帖子，并仅针对使用最近最少使用策略（而非 APC）的其他缓存系统记住我的帖子。

I have had this problem once with memcached and I solved it by using a version number in my keys, like this:

version -> 5
article_5_5 -> cached data
article_10_5 -> cached data
article_17_5 -> cached data

Just change the version number and the group will be effectively "gone"!

memcached uses a least-recently-used policy to remove old data so the old-versioned group will be removed from the cache when the space is needed. I don't know if APC have the same feature.

According to MrGomez this is NOT working for APC. Please read his post, and keep my post in mind only for other cache systems which use a least-recently-used policy (not APC).

回复收藏 0 原文