当前位置：文江博客话题详情

具有极快插入时间的结构

发布于 2024-12-20 04:10:51 字数 379 浏览 4 评论 0原文

我正在寻找一种允许非常快速插入的有序数据结构。这是唯一需要的属性。数据只能从顶部元素访问和删除。

更准确地说，我需要 2 个结构：

1）第一个结构应该允许使用 int 值进行有序插入。完成插入后，它应报告插入元素的排名。

2) 第二个结构应该允许在指定的等级插入。

要存储的元素数量可能是数千或数万。

[编辑] 我必须修改体积假设：即使在任何时刻，有序结构的大小可能在数万范围内，插入的总数也可能在数千万个范围内跑步。

O(1) 内的插入时间会很好，尽管 O(log(log(n))) 也很容易接受。目前，我仅对第一个结构有一些有趣的候选，但要么在 log(n) 中，要么无法报告插入排名（这是强制性的）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一直在等你来 2024-12-27 04:10:51

skip-list 的形式怎么样，特别是链接文章中的“索引跳过列表” 。这应该为您的两个用例提供 O(lg N) 插入和查找，以及 O(1) 对第一个节点的访问。

--编辑--

当我想到 O(1) 算法时，我想到的是基于基数的方法。这是一个 O(1) 插入并返回了排名。这个想法是将密钥分解为半字节，并记录所有具有该前缀的插入项的计数。不幸的是，常量很高（<=64 次取消引用和添加），并且存储空间为 O(2 x 2^INT_BITS)，这很糟糕。这是 16 位整数的版本，扩展到 32 位应该很简单。

int *p1;int *p2;int *p3;int *p4;
void **records;
unsigned int min = 0xFFFF;

int init(void)     {
   p1 = (int*)calloc(16,sizeof(int));
   p2 = (int*)calloc(256, sizeof(int));
   p3 = (int*)calloc(4096, sizeof(int));
   p4 = (int*)calloc(65536,sizeof(int));
   records = (void**)calloc(65536,sizeof(void*));
   return 0;
}

//records that we are storing one more item, 
//counts the number of smaller existing items
int Add1ReturnRank(int* p, int offset, int a) {
   int i, sum=0;
   p+=offset;
   for (i=0;i<a;i++)
      sum += p[i];
   p[i]++;
   return sum;
}

int insert(int key, void* data) {
   unsigned int i4 = (unsigned int)key;
   unsigned int i3= (i4>> 4);
   unsigned int i2= (i3>> 4);
   unsigned int i1= (i2>> 4);
   int rank = Add1ReturnRank(p1,0, i1&0xF);
   rank += Add1ReturnRank(p2,i2&0xF0,i2&0xF);
   rank += Add1ReturnRank(p3,i3&0xFF0,i3&0xF);
   rank += Add1ReturnRank(p4,i4&0xFFF0,i4&0xF);
   if (min>key) {min = key;}
   store(&records[i4],data);
   return rank;
}

该结构还支持O(1) GetMin 和RemoveMin。（GetMin 是即时的，Remove 有一个类似于 Insert 的常量。）

void* getMin(int* key) {
    return data[*key=min];
}

void* removeMin(int* key)  {
   int next = 0;
   void* data = records[min];
   unsigned int i4 = min;
   unsigned int i3= (i4>> 4);
   unsigned int i2= (i3>> 4);
   unsigned int i1= (i2>> 4);

   p4[i4]--;
   p3[i3]--;
   p2[i2]--;
   p1[i1]--;
   *key = min;
   while (!p1[i1]) {
      if (i1==15) { min = 0xFFFF; return NULL;}
      i2 = (++i1)<<4;
   }
   while (!p2[i2])
      i3 = (++i2)<<4;
   while (!p3[i3])
      i4 = (++i3)<<4;
   while (!p4[i4])
      ++i4;
   min = i4;
   return data;
}

如果数据稀疏且分布良好，则可以删除 p4 计数器，而是在 P3 级别进行插入排序。这会将存储成本降低 16，但代价是当存在许多相似值时，最坏情况插入会更高。

改进存储的另一个想法是将这个想法与可扩展哈希之类的东西结合起来。使用整数键作为哈希值，并记录目录中插入的节点的数量。对插入中的相关字典条目进行求和（如上所述）应该仍然是 O(1) 且具有较大的常数，但存储空间将减少到 O(N)

What about a form of skip-list, specifically the " indexed skiplist" in the linked article. That should give O(lg N) insert and lookup, and O(1) access to the first node for both your use cases.

--Edit--

When I think of O(1) algorithms, I think of radix-based methods. Here is an O(1) insert with rank returned. The idea is to break the key up into nibbles, and keep count of all the inserted items which have that prefix. Unfortunately, the the constant is high (<=64 dereferences and additions), and the storage is O(2 x 2^INT_BITS), which is awful. This is the version for 16 bit ints, expanding to 32 bits should be straightforward.

int *p1;int *p2;int *p3;int *p4;
void **records;
unsigned int min = 0xFFFF;

int init(void)     {
   p1 = (int*)calloc(16,sizeof(int));
   p2 = (int*)calloc(256, sizeof(int));
   p3 = (int*)calloc(4096, sizeof(int));
   p4 = (int*)calloc(65536,sizeof(int));
   records = (void**)calloc(65536,sizeof(void*));
   return 0;
}

//records that we are storing one more item, 
//counts the number of smaller existing items
int Add1ReturnRank(int* p, int offset, int a) {
   int i, sum=0;
   p+=offset;
   for (i=0;i<a;i++)
      sum += p[i];
   p[i]++;
   return sum;
}

int insert(int key, void* data) {
   unsigned int i4 = (unsigned int)key;
   unsigned int i3= (i4>> 4);
   unsigned int i2= (i3>> 4);
   unsigned int i1= (i2>> 4);
   int rank = Add1ReturnRank(p1,0, i1&0xF);
   rank += Add1ReturnRank(p2,i2&0xF0,i2&0xF);
   rank += Add1ReturnRank(p3,i3&0xFF0,i3&0xF);
   rank += Add1ReturnRank(p4,i4&0xFFF0,i4&0xF);
   if (min>key) {min = key;}
   store(&records[i4],data);
   return rank;
}

This structure also supports O(1) GetMin and RemoveMin. (GetMin is instant, Remove has a constant similar to Insert.)

void* getMin(int* key) {
    return data[*key=min];
}

void* removeMin(int* key)  {
   int next = 0;
   void* data = records[min];
   unsigned int i4 = min;
   unsigned int i3= (i4>> 4);
   unsigned int i2= (i3>> 4);
   unsigned int i1= (i2>> 4);

   p4[i4]--;
   p3[i3]--;
   p2[i2]--;
   p1[i1]--;
   *key = min;
   while (!p1[i1]) {
      if (i1==15) { min = 0xFFFF; return NULL;}
      i2 = (++i1)<<4;
   }
   while (!p2[i2])
      i3 = (++i2)<<4;
   while (!p3[i3])
      i4 = (++i3)<<4;
   while (!p4[i4])
      ++i4;
   min = i4;
   return data;
}

If your data is sparse and well distributed, you could remove the p4 counter, and instead do an insertion sort into the P3 level. That would reduce storage costs by 16, at the cost of a higher worst case insert when there are many similar values.

Another idea to improve the storage would be to do combine this idea with something like an Extendable Hash. Use the integer key as the hash value, and keep count of the inserted nodes in the directory. Doing a sum over the relevant dictionary entries on an insert (as above) should still be O(1) with a large constant, but the storage would reduce to O(N)

回复收藏 0 原文