boolean[] 与 BitSet:哪个更高效?

发布于 2024-07-14 14:11:45 字数 107 浏览 12 评论 0原文

就内存和 CPU 使用而言,哪个更高效 - 布尔值数组还是 BitSet? 未使用特定的 BitSet 方法,仅使用 get/set/clear(对于数组分别为 ==、=、Arrays.fill)。

What is more efficient in terms of memory and CPU usage — an array of booleans or a BitSet? Specific BitSet methods are not used, only get/set/clear (==, =, Arrays.fill respectively for an array).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

北凤男飞 2024-07-21 14:11:45
  • Boolean[] 每个布尔值使用大约 4-20 个字节。
  • boolean[] 每个布尔值大约使用 1 个字节。
  • BitSet 每个布尔值使用大约 1 位。

内存大小对您来说可能不是问题,在这种情况下 boolean[] 可能更容易编码。

  • Boolean[] uses about 4-20 bytes per boolean value.
  • boolean[] uses about 1 byte per boolean value.
  • BitSet uses about 1 bit per boolean value.

Memory size might not be an issue for you in which case boolean[] might be simpler to code.

赠意 2024-07-21 14:11:45

从 Sun JDK 1.6 计算素数的一些基准测试来看(10 次迭代中最好的预热,给 JIT 编译器一个机会,并排除随机调度延迟,Core 2 Duo T5600 1.83GHz):

BitSet 比 boolean 具有更高的内存效率[ ] 除了非常小的尺寸。 数组中的每个布尔值都占用一个字节。 对于 BitSet 来说,runtime.freeMemory() 中的数字有点混乱,但也比较少。

boolean[] 的 CPU 效率更高,除了非常大的尺寸之外,它们大约是偶数。 例如,对于大小为 100 万的 boolean[] 大约快四倍(例如 6ms 与 27ms),对于 100 万和 1000000000 个,它们大约是偶数。

From some benchmarks with Sun JDK 1.6 computing primes with a sieve (best of 10 iterations to warm up, give the JIT compiler a chance, and exclude random scheduling delays, Core 2 Duo T5600 1.83GHz):

BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.

boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.

放低过去 2024-07-21 14:11:45

在这里,您可以看到内存/时间基准测试,比较 boolean[][] 三角矩阵与 BitSet[] 三角矩阵。

我创建、设置和读取 (size * (size-1) / 2) 值并比较内存使用情况和时间...

在此处输入图像描述

在此处输入图像描述

希望这会有所帮助...

这里代码...(只是一个快速肮脏的测试代码,抱歉;)

import java.util.BitSet;
import java.util.Date;

public class BooleanBitSetProfiler {

    Runtime runtime;
    int sum = 0;
    public void doIt() {

        runtime = Runtime.getRuntime();
        long[][] bitsetMatrix = new long[30][2];
        long[][] booleanMatrix = new long[30][2];
        int size = 1000;
        for (int i = 0; i < booleanMatrix.length; i++) {
            booleanMatrix[i] = testBooleanMatrix(size);
            bitsetMatrix[i] = testBitSet(size);
            size += 2000;
        }
        int debug = 1;
        for (int j = 0; j < booleanMatrix.length; j++){
            System.out.print(booleanMatrix[j][0] + ";");
        }
        System.out.println();
        for (int j = 0; j < booleanMatrix.length; j++){
            System.out.print(booleanMatrix[j][1] + ";");
        }
        System.out.println();
        for (int j = 0; j < bitsetMatrix.length; j++){
            System.out.print(bitsetMatrix[j][0] + ";");
        }
        System.out.println();
        for (int j = 0; j < bitsetMatrix.length; j++){
            System.out.print(bitsetMatrix[j][1] + ";");
        }
        System.out.println();
    }

    private long memory () {
        return runtime.totalMemory() - runtime.freeMemory();
    }
    private long[] testBooleanMatrix(int size) {
        runtime.gc();
        long startTime = new Date().getTime();
        long startMemory = memory();
        boolean[][] matrix = new boolean[size][];
        for (int i = 0; i < size; i++) {
            matrix[i] = new boolean[size - i - 1];
        }
        long creationMemory = memory();
        long creationTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].length; j++) {
                matrix[i][j] = i % 2 == 0;
            }
        }
        long setMemory = memory();
        long setTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].length; j++) {
                if (matrix[i][j]) sum++;
            }
        }
        long readTime = new Date().getTime();
        System.out.println("Boolean[][] (size " + size + ")");
        System.out.println("Creation memory " + printMem(creationMemory-startMemory) + ", set memory " + printMem(setMemory-startMemory));
        System.out.println("Creation time " + printTime(creationTime-startTime) + ", set time " + printTime(setTime - creationTime) + " read time " + printTime(readTime - setTime) + "\n");
        runtime.gc();
        return new long[]{(setMemory-startMemory)/(1024*1024), (readTime-startTime)};
    }
    private long[] testBitSet(int size) {
        runtime.gc();
        long startTime = new Date().getTime();
        long startMemory = memory();
        BitSet[] matrix = new BitSet[size];
        for (int i = 0; i < size; i++) {
            matrix[i] = new BitSet(size - i - 1);
        }
        long creationMemory = memory();
        long creationTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].size(); j++) {
                matrix[i].set(j, (i % 2 == 0));
            }
        }
        long setMemory = memory();
        long setTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].size(); j++) {
                if (matrix[i].get(j)) sum++;
            }
        }
        long readTime = new Date().getTime();
        System.out.println("BitSet[] (size " + size + ")");
        System.out.println("Creation memory " + printMem(creationMemory-startMemory) + ", set memory " + printMem(setMemory-startMemory));
        System.out.println("Creation time " + printTime(creationTime-startTime) + ", set time " + printTime(setTime - creationTime) + " read time " + printTime(readTime - setTime) + "\n");
        runtime.gc();
        return new long[]{(setMemory-startMemory)/(1024*1024), (readTime-startTime)};
    }

    private String printMem(long mem) {
        mem = mem / (1024*1024);
        return mem + "MB";
    }
    private String printTime(long milis) {
        int seconds = (int) (milis / 1000);
        milis = milis % 1000;
        return seconds > 0 ? seconds + "s " + milis + "ms" : milis + "ms";
    }
}

Here you can see a Memory/Time benchmark comparing a boolean[][] trianguar matrix versus BitSet[] triangular matrix.

I create, set and read the (size * (size-1) / 2) values and compare memory usage and time...

enter image description here

enter image description here

Hope this help...

Here the code... (just a quikly dirty test code, sorry ;)

import java.util.BitSet;
import java.util.Date;

public class BooleanBitSetProfiler {

    Runtime runtime;
    int sum = 0;
    public void doIt() {

        runtime = Runtime.getRuntime();
        long[][] bitsetMatrix = new long[30][2];
        long[][] booleanMatrix = new long[30][2];
        int size = 1000;
        for (int i = 0; i < booleanMatrix.length; i++) {
            booleanMatrix[i] = testBooleanMatrix(size);
            bitsetMatrix[i] = testBitSet(size);
            size += 2000;
        }
        int debug = 1;
        for (int j = 0; j < booleanMatrix.length; j++){
            System.out.print(booleanMatrix[j][0] + ";");
        }
        System.out.println();
        for (int j = 0; j < booleanMatrix.length; j++){
            System.out.print(booleanMatrix[j][1] + ";");
        }
        System.out.println();
        for (int j = 0; j < bitsetMatrix.length; j++){
            System.out.print(bitsetMatrix[j][0] + ";");
        }
        System.out.println();
        for (int j = 0; j < bitsetMatrix.length; j++){
            System.out.print(bitsetMatrix[j][1] + ";");
        }
        System.out.println();
    }

    private long memory () {
        return runtime.totalMemory() - runtime.freeMemory();
    }
    private long[] testBooleanMatrix(int size) {
        runtime.gc();
        long startTime = new Date().getTime();
        long startMemory = memory();
        boolean[][] matrix = new boolean[size][];
        for (int i = 0; i < size; i++) {
            matrix[i] = new boolean[size - i - 1];
        }
        long creationMemory = memory();
        long creationTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].length; j++) {
                matrix[i][j] = i % 2 == 0;
            }
        }
        long setMemory = memory();
        long setTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].length; j++) {
                if (matrix[i][j]) sum++;
            }
        }
        long readTime = new Date().getTime();
        System.out.println("Boolean[][] (size " + size + ")");
        System.out.println("Creation memory " + printMem(creationMemory-startMemory) + ", set memory " + printMem(setMemory-startMemory));
        System.out.println("Creation time " + printTime(creationTime-startTime) + ", set time " + printTime(setTime - creationTime) + " read time " + printTime(readTime - setTime) + "\n");
        runtime.gc();
        return new long[]{(setMemory-startMemory)/(1024*1024), (readTime-startTime)};
    }
    private long[] testBitSet(int size) {
        runtime.gc();
        long startTime = new Date().getTime();
        long startMemory = memory();
        BitSet[] matrix = new BitSet[size];
        for (int i = 0; i < size; i++) {
            matrix[i] = new BitSet(size - i - 1);
        }
        long creationMemory = memory();
        long creationTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].size(); j++) {
                matrix[i].set(j, (i % 2 == 0));
            }
        }
        long setMemory = memory();
        long setTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].size(); j++) {
                if (matrix[i].get(j)) sum++;
            }
        }
        long readTime = new Date().getTime();
        System.out.println("BitSet[] (size " + size + ")");
        System.out.println("Creation memory " + printMem(creationMemory-startMemory) + ", set memory " + printMem(setMemory-startMemory));
        System.out.println("Creation time " + printTime(creationTime-startTime) + ", set time " + printTime(setTime - creationTime) + " read time " + printTime(readTime - setTime) + "\n");
        runtime.gc();
        return new long[]{(setMemory-startMemory)/(1024*1024), (readTime-startTime)};
    }

    private String printMem(long mem) {
        mem = mem / (1024*1024);
        return mem + "MB";
    }
    private String printTime(long milis) {
        int seconds = (int) (milis / 1000);
        milis = milis % 1000;
        return seconds > 0 ? seconds + "s " + milis + "ms" : milis + "ms";
    }
}
山川志 2024-07-21 14:11:45

至于内存,BitSet< 的文档/code>具有非常明显的含义。 尤其:

每个位集都有一个当前大小,即空间的位数
当前由位组使用。 请注意,大小与
位集的实现,因此它可能会随着实现而改变。 这
位集的长度与位集的逻辑长度相关,并且是
定义独立于实现。

Java 库类的源代码是公开可用的,人们可以轻松自己检查一下。 特别是:

The internal field corresponding to the serialField "bits".
89 
90     private long[] words;

至于速度; 这取决于一个人在做什么。 一般来说,不要提前考虑速度; 使用语义上最有意义的工具并生成最清晰的代码。 仅在观察到未满足性能要求并识别瓶颈后进行优化。

来到 SO 并询问 A 是否比 B 更快是愚蠢的,原因有很多,包括但当然不限于:

  1. 这取决于应用程序,通常没有人响应该应用程序。 在使用它的上下文中分析和分析它。确保它是一个真正值得优化的瓶颈。
  2. 像这样询问速度的问题通常表明OP认为他们关心效率,但不愿意分析并且没有定义性能要求。 在表面之下,这通常是一个危险信号,表明 OP 走上了错误的道路。

我知道这是一个老问题,但最近才出现; 我相信这一点值得补充。

As for memory, the documentation for a BitSet has pretty clear implications. In particular:

Every bit set has a current size, which is the number of bits of space
currently in use by the bit set. Note that the size is related to the
implementation of a bit set, so it may change with implementation. The
length of a bit set relates to logical length of a bit set and is
defined independently of implementation.

The source for Java library classes is openly available and one can easily check this for themselves. In particular:

The internal field corresponding to the serialField "bits".
89 
90     private long[] words;

As for speed; it depends on what one is doing. In general, don't think about speed ahead of time; use whichever tool makes the most sense semantically and leads to the clearest code. Optimize only after observing that performance requirements aren't met and identifying bottlenecks.

Coming to SO and asking if A is faster than B is silly for many reasons, including but certainly not limited to:

  1. It depends on the application, which nobody responding generally has access to. Analyze and profile it in the context it is being used in. Be sure that it's a bottleneck that's actually worth optimizing.
  2. Questions like this that ask about speed generally show that the OP thinks they care about efficiency but wasn't willing to profile and didn't define performance requirements. Under the surface, that's usually a red flag that the OP is headed down the wrong path.

I know this is an old question but it came up recently; and I believe this is worth adding.

一梦等七年七年为一梦 2024-07-21 14:11:45

A bit left-field of your question, but if storage is a concern you may want to look into Huffman compression. For example, 00000001 could be squeezed down by frequency to something equivalent to {(7)0, (1)1}. A more "randomized" string 00111010 would require a more complex representation, e.g. {(2)0, (3)1, (1)0, (1)1, (1)0}, and take up more space. Depending on the structure of your bit data, you may get some storage benefit from its use, beyond BitSet.

热血少△年 2024-07-21 14:11:45

这取决于一如既往。
是的,BitSet 的内存效率更高,但一旦您需要多线程访问,boolean[] 可能是更好的选择。 例如,对于计算素数,您只需将布尔值设置为 true,因此实际上并不需要同步。 Hans Boehm 写了一些关于此的论文以及相同的技术可用于标记图中的节点。

It depends as always.
Yes BitSet is more memory efficent, but as soon as you require multithreaded access boolean[] might be the better choice. For example for computing primes you only set the boolean to true and therefore you don't really need synchronization. Hans Boehm has written some paper about this and the same technique can be used for marking nodes in graph.

以歌曲疗慰 2024-07-21 14:11:45

从 Java 到 CPU 完全是 VM 特定的。 例如,布尔值过去实际上是作为 32 位值实现的(今天很可能也是如此)。

除非您知道这很重要,否则最好编写清晰的代码,对其进行分析,然后修复速度慢或消耗大量内存的部分。

您可以随时执行此操作。 例如,我曾经决定不在字符串上调用 .intern() ,因为当我在探查器中运行代码时,它会减慢速度太多(尽管使用较少的内存)。

Going from Java to CPU is totally VM specific. For instance, it used to be that a boolean was actually implemented as a 32-bit value (quite probably is true to this day).

Unless you know it is going to matter you are better off writing the code to be clear, profile it, and then fix the parts that are slow or consuming a lot of memory.

You can do this as you go. For instance I once decided to not call .intern() on Strings because when I ran the code in the profiler it slowed it down too much (despite using less memory).

滥情空心 2024-07-21 14:11:45

我相信 BitSet 的内存和 CPU 效率更高,因为它可以在内部将位打包为 int、long 或本机数据类型,而 boolean[] 需要每个数据位一个字节。 此外,如果您要使用其他方法(and、or 等),您会发现 BitSet 更有效,因为不需要迭代数组的每个元素; 而是使用按位数学。

I believe that a BitSet is more memory- and CPU-efficient, is it can internally pack the bits into int, longs, or native data types, whereas a boolean[] requires a byte for each bit of data. Additionally, if you were to use the other methods (and, or, etc), you would find that the BitSet is more efficient, as there is no need to iterate through every element of an array; bitwise math is used instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文