boolean[] 与 BitSet:哪个更高效?
就内存和 CPU 使用而言,哪个更高效 - 布尔值数组还是 BitSet? 未使用特定的 BitSet 方法,仅使用 get/set/clear(对于数组分别为 ==、=、Arrays.fill)。
What is more efficient in terms of memory and CPU usage — an array of boolean
s or a BitSet? Specific BitSet methods are not used, only get/set/clear (==, =, Arrays.fill respectively for an array).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
Boolean[]
每个布尔值使用大约 4-20 个字节。boolean[]
每个布尔值大约使用 1 个字节。BitSet
每个布尔值使用大约 1 位。内存大小对您来说可能不是问题,在这种情况下 boolean[] 可能更容易编码。
Boolean[]
uses about 4-20 bytes per boolean value.boolean[]
uses about 1 byte per boolean value.BitSet
uses about 1 bit per boolean value.Memory size might not be an issue for you in which case boolean[] might be simpler to code.
从 Sun JDK 1.6 计算素数的一些基准测试来看(10 次迭代中最好的预热,给 JIT 编译器一个机会,并排除随机调度延迟,Core 2 Duo T5600 1.83GHz):
BitSet 比 boolean 具有更高的内存效率[ ] 除了非常小的尺寸。 数组中的每个布尔值都占用一个字节。 对于 BitSet 来说,runtime.freeMemory() 中的数字有点混乱,但也比较少。
boolean[] 的 CPU 效率更高,除了非常大的尺寸之外,它们大约是偶数。 例如,对于大小为 100 万的 boolean[] 大约快四倍(例如 6ms 与 27ms),对于 100 万和 1000000000 个,它们大约是偶数。
From some benchmarks with Sun JDK 1.6 computing primes with a sieve (best of 10 iterations to warm up, give the JIT compiler a chance, and exclude random scheduling delays, Core 2 Duo T5600 1.83GHz):
BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.
boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.
在这里,您可以看到内存/时间基准测试,比较 boolean[][] 三角矩阵与 BitSet[] 三角矩阵。
我创建、设置和读取 (size * (size-1) / 2) 值并比较内存使用情况和时间...
希望这会有所帮助...
这里代码...(只是一个快速肮脏的测试代码,抱歉;)
Here you can see a Memory/Time benchmark comparing a boolean[][] trianguar matrix versus BitSet[] triangular matrix.
I create, set and read the (size * (size-1) / 2) values and compare memory usage and time...
Hope this help...
Here the code... (just a quikly dirty test code, sorry ;)
至于内存,
BitSet< 的文档/code>
具有非常明显的含义。 尤其:
Java 库类的源代码是公开可用的,人们可以轻松自己检查一下。 特别是:
至于速度; 这取决于一个人在做什么。 一般来说,不要提前考虑速度; 使用语义上最有意义的工具并生成最清晰的代码。 仅在观察到未满足性能要求并识别瓶颈后进行优化。
来到 SO 并询问 A 是否比 B 更快是愚蠢的,原因有很多,包括但当然不限于:
我知道这是一个老问题,但最近才出现; 我相信这一点值得补充。
As for memory, the documentation for a
BitSet
has pretty clear implications. In particular:The source for Java library classes is openly available and one can easily check this for themselves. In particular:
As for speed; it depends on what one is doing. In general, don't think about speed ahead of time; use whichever tool makes the most sense semantically and leads to the clearest code. Optimize only after observing that performance requirements aren't met and identifying bottlenecks.
Coming to SO and asking if A is faster than B is silly for many reasons, including but certainly not limited to:
I know this is an old question but it came up recently; and I believe this is worth adding.
您的问题有点偏左,但如果存储是一个问题,您可能需要研究霍夫曼压缩< /a>. 例如,
00000001
可以按频率压缩为相当于{(7)0, (1)1}
的值。 更“随机”的字符串00111010
需要更复杂的表示,例如{(2)0, (3)1, (1)0, (1)1, (1)0 }
,并占用更多空间。 根据位数据的结构,您可能会从其使用中获得除BitSet
之外的一些存储优势。A bit left-field of your question, but if storage is a concern you may want to look into Huffman compression. For example,
00000001
could be squeezed down by frequency to something equivalent to{(7)0, (1)1}
. A more "randomized" string00111010
would require a more complex representation, e.g.{(2)0, (3)1, (1)0, (1)1, (1)0}
, and take up more space. Depending on the structure of your bit data, you may get some storage benefit from its use, beyondBitSet
.这取决于一如既往。
是的,BitSet 的内存效率更高,但一旦您需要多线程访问,boolean[] 可能是更好的选择。 例如,对于计算素数,您只需将布尔值设置为 true,因此实际上并不需要同步。 Hans Boehm 写了一些关于此的论文以及相同的技术可用于标记图中的节点。
It depends as always.
Yes BitSet is more memory efficent, but as soon as you require multithreaded access boolean[] might be the better choice. For example for computing primes you only set the boolean to true and therefore you don't really need synchronization. Hans Boehm has written some paper about this and the same technique can be used for marking nodes in graph.
从 Java 到 CPU 完全是 VM 特定的。 例如,布尔值过去实际上是作为 32 位值实现的(今天很可能也是如此)。
除非您知道这很重要,否则最好编写清晰的代码,对其进行分析,然后修复速度慢或消耗大量内存的部分。
您可以随时执行此操作。 例如,我曾经决定不在字符串上调用 .intern() ,因为当我在探查器中运行代码时,它会减慢速度太多(尽管使用较少的内存)。
Going from Java to CPU is totally VM specific. For instance, it used to be that a boolean was actually implemented as a 32-bit value (quite probably is true to this day).
Unless you know it is going to matter you are better off writing the code to be clear, profile it, and then fix the parts that are slow or consuming a lot of memory.
You can do this as you go. For instance I once decided to not call .intern() on Strings because when I ran the code in the profiler it slowed it down too much (despite using less memory).
我相信 BitSet 的内存和 CPU 效率更高,因为它可以在内部将位打包为 int、long 或本机数据类型,而 boolean[] 需要每个数据位一个字节。 此外,如果您要使用其他方法(and、or 等),您会发现 BitSet 更有效,因为不需要迭代数组的每个元素; 而是使用按位数学。
I believe that a BitSet is more memory- and CPU-efficient, is it can internally pack the bits into int, longs, or native data types, whereas a boolean[] requires a byte for each bit of data. Additionally, if you were to use the other methods (and, or, etc), you would find that the BitSet is more efficient, as there is no need to iterate through every element of an array; bitwise math is used instead.