用于一组整数集的简单 C 库是什么?
我必须修改一个 C 程序,并且需要包含一组无符号整数集。也就是说,我有数百万组整数(每个整数组包含 3 到 100 个整数),我需要将它们存储在某种结构中,我们称之为目录,它可以在对数时间内告诉我给定的是否目录中已存在整数集。需要在目录上定义的唯一操作是查找和插入。
对于内置支持有用数据结构的语言来说,这很容易,但我是 C 语言的外国人,在 Google 上环顾四周并没有(令人惊讶地)满意地回答我的问题。这个项目看起来不错:
http://uthash.sourceforge.net/
但我需要想出我自己的哈希密钥生成器。
这是一个标准的、简单的问题,所以我希望有一个标准的、简单的解决方案。
I've got to modify a C program and I need to include a set of unsigned integer sets. That is, I have millions of sets of integers (each of these integer sets contains between 3 and 100 integers), and I need to store these in some structure, lets call it the directory, that can in logarithmic time tell me whether a given integer set already exists in the directory. The only operations that need to be defined on the directory is lookup and insert.
This would be easy in languages with built-in support for useful data structures, but I'm a foreigner to C and looking around on Google did (surprisingly) not answer my question satisfactorily. This project looks about right:
http://uthash.sourceforge.net/
but I would need to come up with my own hash key generator.
This is a standard, simple problem, so I hope there is a standard and simple solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这取决于您要如何处理数据。但也许 tsearch 已经做了你想要的事情。您还可以为每个集合构建一个排序数组并使用 bsearch 查找值,尽管插入过程中性能可能会受到影响。
编辑:如果您正在寻找(外部)库,您会发现一些 C 和 C++ 哈希表实现的比较 此处。本文的作者编写了一个名为 卡什。所以你编译的二进制文件没有任何额外的依赖项。
It depends on what you are going to do with the data. But maybe tsearch does already what you want. You could also build a sorted array for each set and look up the values with bsearch, although the performance may suffer during the insertion.
EDIT: If you are looking for an (external) library, you'll find a comparision of some C and C++ hash table implementation here. The author of the article has written a generic header implementation called khash. So you're compiled binary don't have any additional dependencies.
编辑:抱歉,我开始回答是因为它是 C++ 而不是 C。是的,那么你应该找到你的哈希函数并自己编码。因为你已经知道集合的平均维度,所以并不那么困难,只要选择一个好的哈希函数即可!但是,如果您想检查目录是否已存在,则需要将整个集合编码为一个数字。
您可以尝试对集合中的单个数字进行迭代散列:
散列函数取决于其先前值、当前数字和当前索引。
STL集怎么样?
使用此数据结构,您可以轻松存储所有集合,但您还需要一种方法来检查目录中是否已包含集合。目前尚不清楚:您是否想知道目录中是否已存在具有所有相同元素的集合?
您可以通过检查所有元素来手动完成此操作,但由于您有数百万个元素,您应该找到一种方法以唯一的数字对集合中的元素进行散列并使用集合映射。
EDIT: sorry, I started answering as it's C++ and not C. Yes then you should find your hash function and code it by yourself.. since you already know the average dimension of a set it's not so difficult, just choose a good hash function! But you'll need to codify a whole set in a single number if you want to check if a directory is already there.
You can try by iteratively hashing the single numbers of the set:
in a way that the hashfunction depends on its previous value, the current number and the current index.
What about STL sets?
Using this data structure you can easily store all your sets, but you need also a way to check if a set is already included in the directory. It's not clear: do you want to know if a set that have all the SAME elements exists already in the directory?
You can do it manually by checking all the elements but since you have millions of them you should find a way to hash the elements of the set in an unique number and use a map of sets..
如果我理解正确的话,你想要表示一组整数集,我认为这不是特别微不足道的。
第一点是表示一组整数。最简单的方法是使用可变大小的数组,如下所示:
您可以使用
set->elems[0]=i1; 创建一个新集合(具有固定数量的元素)
并存储元素; ...。
另一种选择是使用位数组,但实现将取决于要存储的整数的性质(例如,它们是否在固定范围内?它们通常出现在集合中的组中吗?)。
获得整数集后,您将需要一个比较函数(以确定两个集合是否具有相同的元素)。如果您选择一个数组来表示一个集合并保持该数组已排序,则检查两个集合是否相同非常简单;如果它是位图,则取决于您如何实现它。
现在,对于集合的集合,您可以选择一个(排序的)向量,在插入元素时可能需要不时调整其大小,或者选择一个哈希表。在后一种情况下,您需要为整数集编写一个哈希函数(可能使用现有函数!)。
正如我所说,这对我来说似乎并不微不足道,我对谷歌没有帮助并不感到惊讶。
不过,这并不是非常复杂,您只需在继续之前做出一些决定即可。
If I understand you correctly, you want to represent a set of sets of integer which I don't think is particularly trivial.
The first point is to represent a set of integers. The simplest way would be use a variable size array like this:
than you can create a new set (with a fixed number of elements) with
and store the elements with
set->elems[0]=i1; ...
.Another option would be to use bit arrays but the implementation would depend of the nature of the integers to store (e.g. are they within a fixed range? Do they usually appear in groups in a set?).
Once you have your set of integers you will need a compare function (to determine whether two sets have the same elements). If you opted for an array to represent a set and you keep that array sorted, is quite simple to check if two sets are identical; if it's a bitmap, it will depend on how you implemented it.
Now, for the set of sets you can choose a (sorted) vector, that you might need to resize from time to time while inserting elements, or an hash table. In the latter case you'll need to write an hash function for your sets of integer (possibly using existing functions!).
As I said, it seems not trivial to me, I'm not surprised google didn't help.
It's not extremely complicated, though, you just have to take some decisions before proceeding.
自己实现一个简单的哈希表。当您知道如何自己实现一个程序员时,它将使您成为一名更好的程序员。
http://en.wikipedia.org/wiki/Hash_table
Implement a simple hash table yourself. It will make you a better programmer, when you know how to implement one on your own.
http://en.wikipedia.org/wiki/Hash_table