当前位置：文江博客话题详情

Redis Set中一个成员占用多少字节

发布于 2024-12-24 17:38:47 字数 130 浏览 4 评论 0原文

我使用 Redis 作为内存中的哈希集。当我将1M 8字节键（二进制）插入Set后，我发现Redis USED_MEMORY大约为100M，这意味着单个成员需要100字节？为什么？

或者我如何配置 Redis 以节省内存使用量。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一笔一画续写前缘 2024-12-31 17:38:47

首先，您应该始终详细说明此类问题的设置，因为内存布局取决于操作系统、内存分配器、平台和 Redis 版本。

在装有 Redis 2.4 的 64 位 Linux 机器上，8 字节密钥的 1M 项集占用 87 MB。

与键的大小相比，这似乎很多，但是任何支持对其项进行有效访问的动态数据结构都会产生开销。您的物品越小，开销就越大。

使用 Redis，大型集合是使用单独的链接哈希表来实现的。每个条目由以下结构表示：

typedef struct dictEntry {
    void *key;
    void *val;
    struct dictEntry *next;
} dictEntry;

由于内存分配器（jemalloc）不支持 24 字节类，因此使用 32 字节。在这个结构体中，val被设置为NULL（这是一个集合），key指向一个对象，定义如下：

typedef struct redisObject {
    unsigned type:4;
    unsigned storage:2;     /* REDIS_VM_MEMORY or REDIS_VM_SWAPPING */
    unsigned encoding:4;
    unsigned lru:22;        /* lru time (relative to server.lruclock) */
    int refcount;
    void *ptr;
} robj;

这个结构体只占用16个字节。它指向密钥数据本身，由以下可变长度结构表示：

struct sdshdr {
    int len;
    int free;
   char buf[];
};

密钥为 8 个字节，加上一个 nul 字符，因此每个密钥的大小为 17 个字节。下一个分配类是 jemalloc 的 32 字节，因此这个结构将占用 32 字节。

总而言之，每个项目的成本为：32+16+32 = 80 字节。他们有1M。为哈希表本身添加一些空间（包含至少 1M 指向 dictEntry 结构的指针），您将获得非常接近我们在此平台上测量的 87 MB 的结果。

优化大型集合的内存占用并不是一件小事。当集合很小（默认小于 512 个项目）并且键实际上是整数时，Redis 会执行优化。请在此处查看更多信息。

一种可能的优化是增加 set-max-intset-entries 参数，并将集合拆分为多个部分。例如，可以对项目键进行散列以将项目分布在不同的集合上。您不仅有 myset，还有 myset:0、myset:1、myset:2 ... myset:n。要检查给定的项目是否是集合，需要对键计算哈希值以找到正确的 myset:X 条目，然后检查该特定条目。目的是将所有这些集合的大小保持在 set-max-intset-entries 参数以下，以便从内存优化中受益。当然，它使得在集合上完成的所有操作变得更加复杂，因此这实际上是复杂性和内存占用之间的权衡。

First, you should always detail your setup for this kind of question, since the memory layout is dependant on the OS, memory allocator, platform and Redis version.

On a 64 bits Linux box with Redis 2.4, a 1M items set of 8 bytes keys eats 87 MB.

It seems a lot compared to the size of the keys, but any dynamic data structure supporting efficient accesses to its items involve an overhead. The smaller your items, the larger the overhead.

With Redis, large sets are implemented using separate chaining hash tables. Each entry is represented by the following structure:

typedef struct dictEntry {
    void *key;
    void *val;
    struct dictEntry *next;
} dictEntry;

Because there is no 24 bytes class supported by the memory allocator (jemalloc), 32 bytes are used. In this structure, val is set to NULL (this is a set), and key points to an object defined as follows:

typedef struct redisObject {
    unsigned type:4;
    unsigned storage:2;     /* REDIS_VM_MEMORY or REDIS_VM_SWAPPING */
    unsigned encoding:4;
    unsigned lru:22;        /* lru time (relative to server.lruclock) */
    int refcount;
    void *ptr;
} robj;

This structure takes only 16 bytes. It points to the key data itself, represented by this variable-length structure:

struct sdshdr {
    int len;
    int free;
   char buf[];
};

The keys are on 8 bytes, plus a nul char, so the size will be 17 bytes per keys. The next allocation class is 32 bytes with jemalloc, so this structure will take 32 bytes.

All in all, each items will cost: 32+16+32 = 80 bytes. There are 1M ot them. Add some space for the hash table itself (containing at least 1M pointers to dictEntry struct), and you get a result which is very close to the 87 MB we can measure on this platform.

Optimizing the memory footprint of a large set is not really trivial. Redis performs optimization when the sets are small (by default less than 512 items) and the keys are actually integers. See more information here.

One possible optimization is to increase the set-max-intset-entries parameter, and split the set in various pieces. For instance item keys can be hashed to distribute the items on various sets. Instead of just myset, you have myset:0, myset:1, myset:2 ... myset:n. To check a given item is is the set, a hash value is calculated on the key to find the correct myset:X entry, and then this specific entry is checked. Purpose is to keep the size of all those sets below the set-max-intset-entries parameter to benefit from the memory optimization. Of course, it makes all operations done on the set more complex, so it is really a tradeoff between complexity and memory footprint.

回复收藏 0 原文

悲喜皆因你 2024-12-31 17:38:47

如果不知道集合中每个成员的底层结构，就不可能说出来。但是，如果您存储键/值，则每个成员都存储键和值（即使该值是空的，它仍然需要保存它的引用）。

为了快速查找键，底层结构很可能是一棵树，这意味着它需要为每个成员存储指向树中左右降序节点的左指针和右指针（或红/黑）。在 64 位系统中，这些指针每个都是 8 个字节。

为了有效地分配和释放键/值对，每个成员节点可以具有指示其大小和可用性（已分配、已删除）的数据成员，以便可以从内存池中分配每个成员节点并进行垃圾收集或标记删除并重新使用。每次填充前一个池时，典型的池分配都会使池大小加倍，以最大限度地减少堆争用，这对于多线程应用程序的性能非常重要。您的 100M 内存使用量可能包含 50M 未使用（但已分配）的密钥持有者。

为什么要节省内存使用量？您打算存储数十亿个哈希键吗？

回复收藏 0 原文

~没有更多了~