有人用 ANSI C 写过字典(哈希图)吗?

发布于 2024-09-30 01:24:51 字数 197 浏览 0 评论 0原文

我只是想知道是否有人可以给我一些指示(没有双关语)如何做到这一点?

我想留出 4GB 内存,以便将数字映射到内存,这样我就不用遍历链表来检查它们是否存在。

因此,我希望能够在 O( 中查找键“2343”,而不是使用 (1,2,3,4,8,34,543,2343) 并遍历 8 个元素来验证“2343”是否在列表中1)时间?

提前致谢

I just wondered if someone could give me some pointers (no pun intended) how to do this?

I want to set aside 4GB of ram in order to map numbers to memory which saves me traversing a linked list checking if they are there.

So instead of having (1,2,3,4,8,34,543,2343) and traversing 8 elements to verify that '2343' is in the list, i want to be able to look up the key '2343' in O(1) time?

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梅窗月明清似水 2024-10-07 01:24:51

我建议在您的项目中嵌入 Lua 。易于嵌入且完全符合 ANSI C,具有一种非常灵活的垃圾收集数据结构(Lua 表/又名哈希图)。你总是可以去掉不需要的部分,但即使你不这样做,Lua 也是很小的。

Lua 有一个基于堆栈的 API,这并不难理解:

  lua_State *L = luaL_newstate();  // make a new lua state
  lua_newtable(L);  // pushes a new table to the top of the stack (position 1)

  // storing values
  lua_pushinteger(2343); // key: 2343
  lua_pushboolean(1);    // value: true
  lua_settable(L, 1);   // pop key/value, store in table at position 1

  // retrieving values
  lua_pushinteger(2343); // key we're looking for
  lua_gettable(L, 1);   // get from table at top of stack - 2; pops key
  if (lua_toboolean(L, -1))  // is it a true value?
  {
    // executes; we know 2343 is true as we pushed it just above
  }
  lua_pop(L, 1);  // pop it off the stack; only our table remains

您也可以迭代这些值,可能不需要链表(但迭代的顺序是不确定的)。完整手册此处

I would advise embedding Lua in your project. Easy to embed and completely ANSI C with one very flexible garbage collected data structure (a Lua table/aka hashmap). You can always strip out the bits that you don't need, but even if you don't Lua is tiny.

Lua has a stack based API which isn't too hard to follow:

  lua_State *L = luaL_newstate();  // make a new lua state
  lua_newtable(L);  // pushes a new table to the top of the stack (position 1)

  // storing values
  lua_pushinteger(2343); // key: 2343
  lua_pushboolean(1);    // value: true
  lua_settable(L, 1);   // pop key/value, store in table at position 1

  // retrieving values
  lua_pushinteger(2343); // key we're looking for
  lua_gettable(L, 1);   // get from table at top of stack - 2; pops key
  if (lua_toboolean(L, -1))  // is it a true value?
  {
    // executes; we know 2343 is true as we pushed it just above
  }
  lua_pop(L, 1);  // pop it off the stack; only our table remains

And you can iterate over the values as well, possibly doing away with the need of your linked list (but the order of the iteration is non-determinate). Full manual here.

风蛊 2024-10-07 01:24:51

如果您只需要检查列表中是否存在该数字,则可以尝试制作位图。

如果数字稀疏地分布在一个大范围内,例如 0-40 亿范围内的 100,000 个值,那么哈希表会更快。对于 Hashtable 的 C 实现看看 GLib 的 Hashtable。

位图仅使用 512 MB 的 RAM 即可保存数字 0-4,294,967,295。

#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <assert.h>

#define BITMAP_TEST 1

#define BITMAP_32_WORD 1

typedef struct Bitmap Bitmap;
#if BITMAP_32_WORD
#define BITWORD_BITS_SHIFT 5
typedef uint32_t Bitword;
#else
#define BITWORD_BITS_SHIFT 6
typedef uint64_t Bitword;
#endif
#define BITWORD_BITS (sizeof(Bitword) * 8)
#define BITWORD_BITS_MASK (BITWORD_BITS - 1)
#define BITWORD_MULT(bit)  ((bit + (BITWORD_BITS_MASK)) & ~(BITWORD_BITS_MASK))
#define BITWORD_TEST(bword, bit) ((bword >> bit) & 1)

#define BITMAP_WORD_COUNT(bit) (BITWORD_MULT(bit) >> BITWORD_BITS_SHIFT)


struct Bitmap {
    size_t  length;
    Bitword *bitmap;
};

extern Bitmap *bitmap_new(size_t len) {
    Bitmap *bitmap = malloc(sizeof(Bitmap));
    bitmap->length = len;
    bitmap->bitmap = calloc(BITMAP_WORD_COUNT(len),sizeof(Bitword));
    return bitmap;
}

extern void bitmap_free(Bitmap *bitmap) {
    free(bitmap->bitmap);
    free(bitmap);
}

extern void bitmap_set(Bitmap *bitmap, size_t bit) {
    assert(bit < bitmap->length);
    bitmap->bitmap[(bit >> BITWORD_BITS_SHIFT)] |= ((Bitword)1 << (bit & BITWORD_BITS_MASK));
}

extern void bitmap_unset(Bitmap *bitmap, size_t bit) {
    assert(bit < bitmap->length);
    bitmap->bitmap[(bit >> BITWORD_BITS_SHIFT)] &= ~((Bitword)1 << (bit & BITWORD_BITS_MASK));
}

extern bool bitmap_test(Bitmap *bitmap, size_t bit) {
    assert(bit < bitmap->length);
    Bitword bword = bitmap->bitmap[(bit >> BITWORD_BITS_SHIFT)];
    return BITWORD_TEST(bword, (bit & BITWORD_BITS_MASK));
}

#ifdef BITMAP_TEST
#include <stdio.h>

#define MAX_VALUE (2343 + 1)
static const uint32_t test_values[] = { 1,2,3,4,8,34,543,2343 };
#define test_values_len (sizeof(test_values)/sizeof(uint32_t))

static void set_values(Bitmap *bitmap, const uint32_t *values, int len) {
    int i;
    for(i=0; i < len; i++) {
        bitmap_set(bitmap, values[i]);
    }
}

static void unset_values(Bitmap *bitmap, const uint32_t *values, int len) {
    int i;
    for(i=0; i < len; i++) {
        bitmap_unset(bitmap, values[i]);
    }
}

static void check_values(Bitmap *bitmap, const uint32_t *values, int len, bool is_set) {
    int i;
    for(i=0; i < len; i++) {
        assert(bitmap_test(bitmap, values[i]) == is_set);
    }
}

int main(int argc, char *argv[]) {
    Bitmap *bitmap = bitmap_new(MAX_VALUE);

    set_values(bitmap, test_values, test_values_len);

    check_values(bitmap, test_values, test_values_len, true);

    unset_values(bitmap, test_values, test_values_len);

    check_values(bitmap, test_values, test_values_len, false);

    bitmap_free(bitmap);
    return 0;
}

#endif

If you only need to check if the number exists in the list, the you can try to make a Bitmap.

If the numbers are going to be sparsely spread out over a large range like 100,000 values in the range 0-4billion then a Hashtable would be faster. For a C implementation of a Hashtable take a look at GLib's Hashtable.

A Bitmap could hold numbers 0-4,294,967,295 using only 512Mbytes of ram.

#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <assert.h>

#define BITMAP_TEST 1

#define BITMAP_32_WORD 1

typedef struct Bitmap Bitmap;
#if BITMAP_32_WORD
#define BITWORD_BITS_SHIFT 5
typedef uint32_t Bitword;
#else
#define BITWORD_BITS_SHIFT 6
typedef uint64_t Bitword;
#endif
#define BITWORD_BITS (sizeof(Bitword) * 8)
#define BITWORD_BITS_MASK (BITWORD_BITS - 1)
#define BITWORD_MULT(bit)  ((bit + (BITWORD_BITS_MASK)) & ~(BITWORD_BITS_MASK))
#define BITWORD_TEST(bword, bit) ((bword >> bit) & 1)

#define BITMAP_WORD_COUNT(bit) (BITWORD_MULT(bit) >> BITWORD_BITS_SHIFT)


struct Bitmap {
    size_t  length;
    Bitword *bitmap;
};

extern Bitmap *bitmap_new(size_t len) {
    Bitmap *bitmap = malloc(sizeof(Bitmap));
    bitmap->length = len;
    bitmap->bitmap = calloc(BITMAP_WORD_COUNT(len),sizeof(Bitword));
    return bitmap;
}

extern void bitmap_free(Bitmap *bitmap) {
    free(bitmap->bitmap);
    free(bitmap);
}

extern void bitmap_set(Bitmap *bitmap, size_t bit) {
    assert(bit < bitmap->length);
    bitmap->bitmap[(bit >> BITWORD_BITS_SHIFT)] |= ((Bitword)1 << (bit & BITWORD_BITS_MASK));
}

extern void bitmap_unset(Bitmap *bitmap, size_t bit) {
    assert(bit < bitmap->length);
    bitmap->bitmap[(bit >> BITWORD_BITS_SHIFT)] &= ~((Bitword)1 << (bit & BITWORD_BITS_MASK));
}

extern bool bitmap_test(Bitmap *bitmap, size_t bit) {
    assert(bit < bitmap->length);
    Bitword bword = bitmap->bitmap[(bit >> BITWORD_BITS_SHIFT)];
    return BITWORD_TEST(bword, (bit & BITWORD_BITS_MASK));
}

#ifdef BITMAP_TEST
#include <stdio.h>

#define MAX_VALUE (2343 + 1)
static const uint32_t test_values[] = { 1,2,3,4,8,34,543,2343 };
#define test_values_len (sizeof(test_values)/sizeof(uint32_t))

static void set_values(Bitmap *bitmap, const uint32_t *values, int len) {
    int i;
    for(i=0; i < len; i++) {
        bitmap_set(bitmap, values[i]);
    }
}

static void unset_values(Bitmap *bitmap, const uint32_t *values, int len) {
    int i;
    for(i=0; i < len; i++) {
        bitmap_unset(bitmap, values[i]);
    }
}

static void check_values(Bitmap *bitmap, const uint32_t *values, int len, bool is_set) {
    int i;
    for(i=0; i < len; i++) {
        assert(bitmap_test(bitmap, values[i]) == is_set);
    }
}

int main(int argc, char *argv[]) {
    Bitmap *bitmap = bitmap_new(MAX_VALUE);

    set_values(bitmap, test_values, test_values_len);

    check_values(bitmap, test_values, test_values_len, true);

    unset_values(bitmap, test_values, test_values_len);

    check_values(bitmap, test_values, test_values_len, false);

    bitmap_free(bitmap);
    return 0;
}

#endif
笨死的猪 2024-10-07 01:24:51

如果数字是 32 位,则甚至不需要散列,只需使用数组即可。

If the numbers are 32 bits you don't even need hashing, just use an array.

花开雨落又逢春i 2024-10-07 01:24:51

当没有具有相同哈希值的键时,哈希表实际上只有 O(1)。

有关 C 中哈希表的简单简短版本,请参见此处:
http://pokristensson.com/strmap.html

A hashtable is actually only O(1) when there are no keys that have the same hash.

For an easy short version of a hashtable in C look here:
http://pokristensson.com/strmap.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文