计算 .Net BitArray 类中设置的位

发布于 2024-10-18 22:38:49 字数 307 浏览 7 评论 0原文

我正在实现一个库,其中广泛使用 .Net BitArray 类,并且需要与 Java BitSet.Cardinality() 方法等效的方法,即返回设置的位数的方法。我正在考虑将其实现为 BitArray 类的扩展方法。简单的实现是迭代和计算位集(如下所示),但我想要更快的实现,因为我将执行数千个集合操作并计算答案。有没有比下面的例子更快的方法?

count = 0;

for (int i = 0; i < mybitarray.Length; i++)
{

  if (mybitarray [i])
    count++;
}

I am implementing a library where I am extensively using the .Net BitArray class and need an equivalent to the Java BitSet.Cardinality() method, i.e. a method which returns the number of bits set. I was thinking of implementing it as an extension method for the BitArray class. The trivial implementation is to iterate and count the bits set (like below), but I wanted a faster implementation as I would be performing thousands of set operations and counting the answer. Is there a faster way than the example below?

count = 0;

for (int i = 0; i < mybitarray.Length; i++)
{

  if (mybitarray [i])
    count++;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

谜泪 2024-10-25 22:38:49

这是我的解决方案,基于 http://graphics.stanford.edu/~ 中的“最佳位计数方法” seander/bithacks.html#CountBitsSetParallel

public static Int32 GetCardinality(BitArray bitArray)
{

    Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];

    bitArray.CopyTo(ints, 0);

    Int32 count = 0;

    // fix for not truncated bits in last integer that may have been set to true with SetAll()
    ints[ints.Length - 1] &= ~(-1 << (bitArray.Count % 32));

    for (Int32 i = 0; i < ints.Length; i++)
    {

        Int32 c = ints[i];

        // magic (http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
        unchecked
        {
        c = c - ((c >> 1) & 0x55555555);
        c = (c & 0x33333333) + ((c >> 2) & 0x33333333);
        c = ((c + (c >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
        }

        count += c;

    }

    return count;

}

根据我的测试,这比简单的 foreach 循环快大约 60 倍,并且仍然比 Kernighan 方法快 30 倍,在 1000 位的 BitArray 中大约 50% 位设置为 true 。如果需要的话我还有一个 VB 版本。

This is my solution based on the "best bit counting method" from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel

public static Int32 GetCardinality(BitArray bitArray)
{

    Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];

    bitArray.CopyTo(ints, 0);

    Int32 count = 0;

    // fix for not truncated bits in last integer that may have been set to true with SetAll()
    ints[ints.Length - 1] &= ~(-1 << (bitArray.Count % 32));

    for (Int32 i = 0; i < ints.Length; i++)
    {

        Int32 c = ints[i];

        // magic (http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
        unchecked
        {
        c = c - ((c >> 1) & 0x55555555);
        c = (c & 0x33333333) + ((c >> 2) & 0x33333333);
        c = ((c + (c >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
        }

        count += c;

    }

    return count;

}

According to my tests, this is around 60 times faster than the simple foreach loop and still 30 times faster than the Kernighan approach with around 50% bits set to true in a BitArray with 1000 bits. I also have a VB version of this if needed.

澜川若宁 2024-10-25 22:38:49

您可以使用 Linq 轻松完成此任务

BitArray ba = new BitArray(new[] { true, false, true, false, false });
var numOnes = (from bool m in ba
           where m
           select m).Count();

you can accomplish this pretty easily with Linq

BitArray ba = new BitArray(new[] { true, false, true, false, false });
var numOnes = (from bool m in ba
           where m
           select m).Count();
云巢 2024-10-25 22:38:49
BitArray myBitArray = new BitArray(...

int
    bits = myBitArray.Count,
    size = ((bits - 1) >> 3) + 1,
    counter = 0,
    x,
    c;

    byte[] buffer = new byte[size];
    myBitArray.CopyTo(buffer, 0);

    for (x = 0; x < size; x++)
        for (c = 0; buffer[x] > 0; buffer[x] >>= 1)
            counter += buffer[x] & 1;

取自“计数位设置,Brian Kernighan 的方式”并针对字节进行了调整。我将它用于 1 000 000+ 位的位数组,效果非常好。
如果您的位不是 n*8 那么您可以手动计算 mod 字节。

BitArray myBitArray = new BitArray(...

int
    bits = myBitArray.Count,
    size = ((bits - 1) >> 3) + 1,
    counter = 0,
    x,
    c;

    byte[] buffer = new byte[size];
    myBitArray.CopyTo(buffer, 0);

    for (x = 0; x < size; x++)
        for (c = 0; buffer[x] > 0; buffer[x] >>= 1)
            counter += buffer[x] & 1;

Taken from "Counting bits set, Brian Kernighan's way" and adapted for bytes. I'm using it for bit arrays of 1 000 000+ bits and it's superb.
If your bits are not n*8 then you can count the mod byte manually.

原来分手还会想你 2024-10-25 22:38:49

在找不到使用查找表的版本后,我写了我的版本:

private int[] _bitCountLookup;
private void InitLookupTable()
{
    _bitCountLookup = new int[256];

    for (var byteValue = 0; byteValue < 256; byteValue++)
    {
        var count = 0;
        for (var bitIndex = 0; bitIndex < 8; bitIndex++)
        {
            count += (byteValue >> bitIndex) & 1;
        }
        _bitCountLookup[byteValue] = count;
    }
}

private int CountSetBits(BitArray bitArray)
{
    var result = 0;
    var numberOfFullBytes = bitArray.Length / 8;
    var numberOfTailBits = bitArray.Length % 8;
    var tailByte = numberOfTailBits > 0 ? 1 : 0;
    var bitArrayInBytes = new byte[numberOfFullBytes + tailByte];
    bitArray.CopyTo(bitArrayInBytes, 0);

    for (var i = 0; i < numberOfFullBytes; i++)
    {
        result += _bitCountLookup[bitArrayInBytes[i]];
    }

    for (var i = (numberOfFullBytes * 8); i < bitArray.Length; i++)
    {
        if (bitArray[i])
        {
            result++;
        }
    }
    return result;
}

I wrote my version of after not finding one that uses a look-up table:

private int[] _bitCountLookup;
private void InitLookupTable()
{
    _bitCountLookup = new int[256];

    for (var byteValue = 0; byteValue < 256; byteValue++)
    {
        var count = 0;
        for (var bitIndex = 0; bitIndex < 8; bitIndex++)
        {
            count += (byteValue >> bitIndex) & 1;
        }
        _bitCountLookup[byteValue] = count;
    }
}

private int CountSetBits(BitArray bitArray)
{
    var result = 0;
    var numberOfFullBytes = bitArray.Length / 8;
    var numberOfTailBits = bitArray.Length % 8;
    var tailByte = numberOfTailBits > 0 ? 1 : 0;
    var bitArrayInBytes = new byte[numberOfFullBytes + tailByte];
    bitArray.CopyTo(bitArrayInBytes, 0);

    for (var i = 0; i < numberOfFullBytes; i++)
    {
        result += _bitCountLookup[bitArrayInBytes[i]];
    }

    for (var i = (numberOfFullBytes * 8); i < bitArray.Length; i++)
    {
        if (bitArray[i])
        {
            result++;
        }
    }
    return result;
}
小瓶盖 2024-10-25 22:38:49

我遇到了同样的问题,但要转换的不仅仅是一种基数方法。因此,我选择移植整个 BitSet 类。幸运的是它是独立的。

这是C# 端口的要点

如果人们能够报告发现的任何错误,我将不胜感激 - 我不是 Java 开发人员,并且对位逻辑的经验有限,因此我可能错误地翻译了其中的一些内容。

I had the same issue, but had more than just the one Cardinality method to convert. So, I opted to port the entire BitSet class. Fortunately it was self-contained.

Here is the Gist of the C# port.

I would appreciate if people would report any bugs that are found - I am not a Java developer, and have limited experience with bit logic, so I might have translated some of it incorrectly.

北座城市 2024-10-25 22:38:49

由于使用了 System.Numerics.BitOperations.PopCount

C#

Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];
bitArray.CopyTo(ints, 0);
Int32 count = 0;
for (Int32 i = 0; i < ints.Length; i++) {
    count += BitOperations.PopCount(ints[i]);
}
Console.WriteLine(count);

F#,

let ints = Array.create ((bitArray.Count >>> 5) + 1) 0u
bitArray.CopyTo(ints, 0)
ints
|> Array.sumBy BitOperations.PopCount
|> printfn "%d"

比公认的答案更快、更简单的版本请参阅 BitOperations.PopCount 是在 .NET 中计算 BitArray 基数的最佳方法吗?

Faster and simpler version than the accepted answer thanks to the use of System.Numerics.BitOperations.PopCount

C#

Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];
bitArray.CopyTo(ints, 0);
Int32 count = 0;
for (Int32 i = 0; i < ints.Length; i++) {
    count += BitOperations.PopCount(ints[i]);
}
Console.WriteLine(count);

F#

let ints = Array.create ((bitArray.Count >>> 5) + 1) 0u
bitArray.CopyTo(ints, 0)
ints
|> Array.sumBy BitOperations.PopCount
|> printfn "%d"

See more details in Is BitOperations.PopCount the best way to compute the BitArray cardinality in .NET?

装纯掩盖桑 2024-10-25 22:38:49

您可以使用 Linq,但它毫无用处且速度较慢:

var sum = mybitarray.OfType<bool>().Count(p => p);

You could use Linq, but it would be useless and slower:

var sum = mybitarray.OfType<bool>().Count(p => p);
泪眸﹌ 2024-10-25 22:38:49

使用 BitArray 没有更快的方法 - 归根结底是您必须对它们进行计数 - 您可以使用 LINQ 来执行此操作或执行您自己的循环,但 < 没有提供方法code>BitArray 并且底层数据结构是一个 int[] 数组(如 Reflector 所示) - 所以这将始终是 O(n),n 是数组中的位数大批。

我能想到的使其更快的唯一方法是使用反射来获取底层的 m_array 字段,然后您可以绕过 Get() 使用的边界检查每次调用时(见下文) - 但这有点脏,并且可能只在非常大的数组上值得,因为反射很昂贵。

public bool Get(int index)
{
    if ((index < 0) || (index >= this.Length))
    {
        throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
    }
    return ((this.m_array[index / 0x20] & (((int) 1) << (index % 0x20))) != 0);
}

如果这种优化对您来说真的很重要,您应该创建自己的位操作类,该类在内部可以使用 BitArray,但会跟踪设置的位数并提供适当的方法(主要委托给BitArray 但添加方法来获取当前设置的位数) - 那么当然这将是 O(1)。

There is no faster way with using BitArray - What it comes down to is you will have to count them - you could use LINQ to do that or do your own loop, but there is no method offered by BitArray and the underlying data structure is an int[] array (as seen with Reflector) - so this will always be O(n), n being the number of bits in the array.

The only way I could think of making it faster is using reflection to get a hold of the underlying m_array field, then you can get around the boundary checks that Get() uses on every call (see below) - but this is kinda dirty, and might only be worth it on very large arrays since reflection is expensive.

public bool Get(int index)
{
    if ((index < 0) || (index >= this.Length))
    {
        throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
    }
    return ((this.m_array[index / 0x20] & (((int) 1) << (index % 0x20))) != 0);
}

If this optimization is really important to you, you should create your own class for bit manipulation, that internally could use BitArray, but keeps track of the number of bits set and offers the appropriate methods (mostly delegate to BitArray but add methods to get number of bits currently set) - then of course this would be O(1).

淡淡绿茶香 2024-10-25 22:38:49

如果你真的想最大化速度,你可以预先计算一个查找表,其中给定一个字节值你就有基数,但 BitArray 不是最理想的结构,因为你需要使用反射来拉取底层存储并对整数类型进行操作 - 请参阅

另一种也许更有用的技术是使用Kernighan 技巧 ,对于基数 m 的 n 位值来说是 O(m)。

static readonly ZERO = new BitArray (0);
static readonly NOT_ONE = new BitArray (1).Not ();

public static int GetCardinality (this BitArray bits)
{
    int c = 0;
    var tmp = new BitArray (myBitArray);

    for (c; tmp != ZERO; c++)
        tmp = tmp.And (tmp.And (NOT_ONE));

    return c;
}

这也比在 C 语言中更麻烦一点,因为整数类型和 BitArray 之间没有定义任何操作(例如,tmp &= tmp - 1 来清除最少的数据)重要的设置位,已被转换为 tmp &= (tmp & ~0x1)

我不知道这最终是否比 BCL BitArray 的情况下的天真迭代更快,但是从算法上来说它应该更优越


编辑:引用了我发现 Kernighan 技巧的地方,并提供了更深入的解释。

If you really want to maximize the speed, you could pre-compute a lookup table where given a byte-value you have the cardinality, but BitArray is not the most ideal structure for this, since you'd need to use reflection to pull the underlying storage out of it and operate on the integral types - see this question for a better explanation of that technique.

Another, perhaps more useful technique, is to use something like the Kernighan trick, which is O(m) for an n-bit value of cardinality m.

static readonly ZERO = new BitArray (0);
static readonly NOT_ONE = new BitArray (1).Not ();

public static int GetCardinality (this BitArray bits)
{
    int c = 0;
    var tmp = new BitArray (myBitArray);

    for (c; tmp != ZERO; c++)
        tmp = tmp.And (tmp.And (NOT_ONE));

    return c;
}

This too is a bit more cumbersome than it would be in say C, because there are no operations defined between integer types and BitArrays, (tmp &= tmp - 1, for example, to clear the least significant set bit, has been translated to tmp &= (tmp & ~0x1).

I have no idea if this ends up being any faster than naively iterating for the case of the BCL BitArray, but algorithmically speaking it should be superior.


EDIT: cited where I discovered the Kernighan trick, with a more in-depth explanation

逆夏时光 2024-10-25 22:38:49

如果您不介意将 System.Collections.BitArray 的代码复制到您的项目中并编辑它,您可以像同伴一样编写:
(我认为这是最快的。我尝试使用 BitVector32[] 来实现我的 BitArray,但它仍然很慢。)

    public void Set(int index, bool value)
    {
        if ((index < 0) || (index >= this.m_length))
        {
            throw new ArgumentOutOfRangeException("index", "Index Out Of Range");
        }
        SetWithOutAuth(index,value);
    }
    //When in batch  setting values,we need one method that won't auth the index range
    private void SetWithOutAuth(int index, bool value) 
    {
        int v = ((int)1) << (index % 0x20);
        index = index / 0x20;
        bool NotSet = (this.m_array[index] & v) == 0;
        if (value && NotSet)
        {
            CountOfTrue++;//Count the True values
            this.m_array[index] |= v;
        }
        else if (!value && !NotSet)
        {
            CountOfTrue--;//Count the True values
            this.m_array[index] &= ~v;
        }
        else 
            return;
        this._version++;
    }

    public int CountOfTrue { get; internal set; }

    public void BatchSet(int start, int length, bool value)
    {
        if (start < 0 || start >= this.m_length || length <= 0)
            return;
        for (int i = start; i < length && i < this.m_length; i++)
        {
            SetWithOutAuth(i,value);
        }
    }

If you don't mind to copy the code of System.Collections.BitArray to your project and Edit it,you can write as fellow:
(I think it's the fastest. And I've tried use BitVector32[] to implement my BitArray, but it's still so slow.)

    public void Set(int index, bool value)
    {
        if ((index < 0) || (index >= this.m_length))
        {
            throw new ArgumentOutOfRangeException("index", "Index Out Of Range");
        }
        SetWithOutAuth(index,value);
    }
    //When in batch  setting values,we need one method that won't auth the index range
    private void SetWithOutAuth(int index, bool value) 
    {
        int v = ((int)1) << (index % 0x20);
        index = index / 0x20;
        bool NotSet = (this.m_array[index] & v) == 0;
        if (value && NotSet)
        {
            CountOfTrue++;//Count the True values
            this.m_array[index] |= v;
        }
        else if (!value && !NotSet)
        {
            CountOfTrue--;//Count the True values
            this.m_array[index] &= ~v;
        }
        else 
            return;
        this._version++;
    }

    public int CountOfTrue { get; internal set; }

    public void BatchSet(int start, int length, bool value)
    {
        if (start < 0 || start >= this.m_length || length <= 0)
            return;
        for (int i = start; i < length && i < this.m_length; i++)
        {
            SetWithOutAuth(i,value);
        }
    }
恰似旧人归 2024-10-25 22:38:49

这个问题自然是 O(n),因此你的解决方案可能是最有效的。

由于您试图对位的任意子集进行计数,因此在设置位时无法对位进行计数(如果您不经常设置位,则会提供速度提升)。

您可以检查您正在使用的处理器是否具有返回设置位数的命令。例如,具有 SSE4 的处理器可以使用 POPCNT 根据这篇文章。这可能对您不起作用,因为 .Net 不允许汇编(因为它是独立于平台的)。此外,ARM 处理器可能没有等效的处理器。

也许最好的解决方案是查找表(或者开关,如果您可以保证开关将编译为单个跳转到 currentLocation + byteValue)。这将为您提供整个字节的计数。当然,BitArray 不提供对底层数据类型的访问,因此您必须创建自己的 BitArray。您还必须保证字节中的所有位始终是交集的一部分,这听起来不太可能。

另一种选择是使用布尔数组而不是 BitArray。这样做的优点是不需要从字节中的其他位中提取该位。缺点是数组将占用 8 倍的内存空间,这意味着不仅浪费空间,而且在迭代数组执行计数时会推送更多数据。

标准数组查找和 BitArray 查找之间的区别如下:
数组:

  1. offset = index * indexSize
  2. 获取 location + offset 处的内存并保存到 value

BitArray:

  1. index = index/indexSize
  2. offset = index * indexSize
  3. 获取 location + offset 处的内存并保存到 value
  4. position = index%indexSize
  5. 移位值位置的位数
  6. value = value 和 1

除了用于数组的 #2 和 #3 之外,大多数这些命令需要 1 个处理器周期才能完成。使用 x86/x64 处理器可以将某些命令组合成 1 个命令,但可能不能使用 ARM,因为它使用精简的指令集。
两者(数组或 BitArray)中哪一个性能更好取决于您的平台(处理器速度、处理器指令、处理器缓存大小、处理器缓存速度、系统内存量 (Ram)、系统内存 (CAS) 速度、处理器和 RAM 之间的连接)以及要计数的索引的分布(交叉点是最常聚集的还是随机分布的)。

总结:您可能会找到一种方法来使其更快,但您的解决方案是使用 .NET 中的每布尔模型一位的数据集获得的最快的解决方案。

编辑:确保您正在访问要按顺序计数的索引。如果按该顺序访问索引 200、5、150、151、311、6,则会增加缓存未命中数量,从而导致等待从 RAM 检索值的时间更长。

The problem is naturally O(n), as a result your solution is probably the most efficient.

Since you are trying to count an arbitrary subset of bits you cannot count the bits when they are set (would would provide a speed boost if you are not setting the bits too often).

You could check to see if the processor you are using has a command which will return the number of set bits. For example a processor with SSE4 could use the POPCNT according to this post. This would probably not work for you since .Net does not allow assembly (because it is platform independent). Also, ARM processors probably do not have an equivalent.

Probably the best solution would be a look up table (or switch if you could guarantee the switch will compiled to a single jump to currentLocation + byteValue). This would give you the count for the whole byte. Of course BitArray does not give access to the underlying data type so you would have to make your own BitArray. You would also have to guarantee that all the bits in the byte will always be part of the intersection which does not sound likely.

Another option would be to use an array of booleans instead of a BitArray. This has the advantage not needing to extract the bit from the others in the byte. The disadvantage is the array will take up 8x as much space in memory meaning not only wasted space, but also more data push as you iterate through the array to perform your count.

The difference between a standard array look up and a BitArray look up is as follows:
Array:

  1. offset = index * indexSize
  2. Get memory at location + offset and save to value

BitArray:

  1. index = index/indexSize
  2. offset = index * indexSize
  3. Get memory at location + offset and save to value
  4. position = index%indexSize
  5. Shift value position bits
  6. value = value and 1

With the exception of #2 for Arrays and #3 most of these commands take 1 processor cycle to complete. Some of the commands can be combined into 1 command using x86/x64 processors, though probably not with ARM since it uses a reduced set of instructions.
Which of the two (array or BitArray) perform better will be specific to your platform (processor speed, processor instructions, processor cache sizes, processor cache speed, amount of system memory (Ram), speed of system memory (CAS), speed of connection between processor and RAM) as well as the spread of indexes you want to count (are the intersections most often clustered or are they randomly distributed).

To summarize: you could probably find a way to make it faster, but your solution is the fastest you will get for your data set using a bit per boolean model in .NET.

Edit: make sure you are accessing the indexes you want to count in order. If you access indexes 200, 5, 150, 151, 311, 6 in that order then you will increase the amount of cache misses resulting in more time spent waiting for values to be retrieved from RAM.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文