存储大量一位数据的最佳数据结构
我想存储大量数据,以便
- 可以通过索引访问它们,
- 每个数据只是是和否(所以可能每个数据一位就足够了)
我正在寻找具有最高性能和占用最少空间的数据结构。
可能将数据存储在平面内存中,每个数据一位不是一个好的选择,另一方面,使用不同类型的树结构仍然使用大量内存(例如,每个节点中的指针都需要创建这些树,即使每个节点只有一位数据)。
有人有什么想法吗?
I want to store lots of data so that
- they can be accessed by an index,
- each data is just yes and no (so probably one bit is enough for each)
I am looking for the data structure which has the highest performance and occupy least space.
probably storing data in a flat memory, one bit per data is not a good choice on the other hand using different type of tree structures still use lots of memory (e.g. pointers in each node are required to make these tree even though each node has just one bit of data).
Does anyone have any Idea?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用单个内存块并每字节存储 1 位(易于索引,但每字节浪费 7 位)或打包数据(索引稍微复杂一些,但内存效率更高)有什么问题?
What's wrong with using a single block of memory and either storing 1 bit per byte (easy indexing, but wastes 7 bits per byte) or packing the data (slightly trickier indexing, but more memory efficient) ?
在 Java 中,BitSet 可能是一个不错的选择 http:// /download.oracle.com/javase/6/docs/api/java/util/BitSet.html
Well in Java the BitSet might be a good choice http://download.oracle.com/javase/6/docs/api/java/util/BitSet.html
如果我正确理解你的问题,你应该将它们存储在一个无符号整数中,其中你将每个值分配给整数(标志)的一个位。
假设您代表 3 个值,它们可以打开或关闭。然后将第一个分配给 1,第二个分配给 2,第三个分配给 4。然后,您的 unsigned int 可以是 0,1,2,3,4,5,6 或 7,具体取决于哪些值是打开或关闭,然后您检查使用按位比较的值。
If I understand your question correctly you should store them in an unsigned integer where you assign each value to a bit of the integer (flag).
Say you represent 3 values and they can be on or off. Then you assign the first to 1, the second to 2 and the third to 4. Your unsigned int can then be 0,1,2,3,4,5,6 or 7 depending on which values are on or off and you check the values using bitwise comparison.
取决于语言以及您如何定义“索引”。如果您的意思是索引运算符必须起作用,那么您的语言将需要能够重载索引运算符。如果您不介意使用索引宏或函数,则可以通过将给定索引除以类型中的位数(例如 char 为 8,uint32_t 和变体为 32)来访问第 n 个元素,然后返回以下结果arr[n / n_bits] & arr[n / n_bits] & (1 << (n % n_bits))
Depends on the language and how you define 'index'. If you mean that the index operator must work, then your language will need to be able to overload the index operator. If you don't mind using an index macro or function, you can access the nth element by dividing the given index by the number of bits in your type (say 8 for char, 32 for uint32_t and variants), then return the result of
arr[n / n_bits] & (1 << (n % n_bits))
看看布隆过滤器:http://en.wikipedia.org/wiki/Bloom_filter
它性能非常好并且节省空间。但请务必阅读下面的细则 ;-):引用上述 wiki 页面。
Have a look at a Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
It performs very well and is space-efficient. But make sure you read the fine print below ;-): Quote from the above wiki page.