以 C# BinaryReader.ReadString 的 7 位格式编码整数

发布于 2024-08-07 14:34:55 字数 210 浏览 13 评论 0原文

C#BinaryReader有一个功能,根据MSDN,读取一个编码为“七位整数”的整数,然后读取一个具有该整数长度的字符串。

是否有关于七位整数格式的明确文档(我粗略地理解MSB或LSB标记是否有更多字节要读取,其余位是数据,但我会很高兴提供更准确的东西)。

更好的是,是否有一个 C 实现可以读取和写入这种格式的数字?

C#'s BinaryReader has a function that according to MSDN, reads an integer encoded as "seven bit integer", and then reads a string with the length of this integer.

Is there a clear documentation for the seven bit integer format (I have a rough understanding that the MSB or the LSB marks whether there are more bytes to read, and the rest bits are the data, but I'll be glad for something more exact).

Even better, is there a C implementation for reading and writing numbers in this format?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

廻憶裏菂餘溫 2024-08-14 14:34:55

好吧, BinaryReader.Read7BitEncodedInt 的文档已经说,它希望使用 BinaryWriter 写入该值。 Write7BitEncodedInt 和该方法文档详细说明了格式:

值参数的整数从七个最低有效位开始一次写出七位。一个字节的高位表示该字节之后是否还有字节需要写入。

如果值适合七位,则仅占用一个字节的空间。如果值不能容纳在七位中,则将高位设置在第一个字节上并写出。然后将值移位七位并写入下一个字节。重复此过程,直到写入整个整数。

因此,整数 1259551277(二进制 1001011000100110011101000101101)将被转换为 7 位格式,如下所示:

Remaining integer                 encoded bytes
1001011000100110011101000101101
100101100010011001110100          00101101
10010110001001100                 10101101 01110100
1001011000                        10101101 11110100 01001100
100                               10101101 11110100 11001100 01011000
0                                 10101101 11110100 11001100 11011000 00000100

不过,我现在对我的 C 技能不太有信心来提供有效的实现。但根据该描述,这并不难做到。

Well, the documentation for BinaryReader.Read7BitEncodedInt already says, that it expects the value to be written with BinaryWriter.Write7BitEncodedInt and that method documentation details the format:

The integer of the value parameter is written out seven bits at a time, starting with the seven least-significant bits. The high bit of a byte indicates whether there are more bytes to be written after this one.

If value will fit in seven bits, it takes only one byte of space. If value will not fit in seven bits, the high bit is set on the first byte and written out. value is then shifted by seven bits and the next byte is written. This process is repeated until the entire integer has been written.

So the integer 1259551277, in binary 1001011000100110011101000101101 will be converted into that 7-bit format as follows:

Remaining integer                 encoded bytes
1001011000100110011101000101101
100101100010011001110100          00101101
10010110001001100                 10101101 01110100
1001011000                        10101101 11110100 01001100
100                               10101101 11110100 11001100 01011000
0                                 10101101 11110100 11001100 11011000 00000100

I'm not that confident in my C skills right now to provide a working implementation, though. But it's not very hard to do, based on that description.

爱情眠于流年 2024-08-14 14:34:55

基本上,7 位编码 Int32 背后的想法是减少小值所需的字节数。它的工作原理如下:

  1. 取原始值的前 7 个最低有效位。
  2. 如果该值超出了这 7 位的容量,则第 8 位将设置为 1,表示必须读取另一个字节。否则该位为 0,读取到此结束。
  3. 读取下一个字节,其值左移 7 位,并与之前读取的值进行或运算,将它们组合在一起。同样,该字节的第 8 位指示是否必须读取另一个字节(将读取值再移位 7 次)。
  4. 此过程一直持续到最多读取 5 个字节为止(因为当每个字节仅窃取 1 位时,即使 Int32.MaxValue 也不需要超过 5 个字节)。如果第 5 个字节的最高位仍然设置,则您读取的内容不是 7 位编码的 Int32。

请注意,由于它是逐字节写入的,因此这些值的字节序根本不重要。给定值范围需要以下字节数:

  • 1 个字节:0 到 127
  • 2 个字节:128 到 16,383
  • 3 个字节:16,384 到 2,097,151
  • 4 个字节:2,097,152 到 268,435,455
  • 5 个字节:268,435,456 到 2,147,483,647代码>Int32。 MaxValue) 和 -2,147,483,648 (Int32.MinValue) 到 -1

如您所见,实现有点愚蠢,并且负值总是需要 5 个字节,因为符号位是第 32 位原始值,始终以第 5 个字节结束。

因此,我不建议将其用于负值或大于 ~250,000,000 的值。我只看到它在内部用于 .NET 字符串的字符串长度前缀(您可以使用 BinaryReader.ReadStringBinaryReader.WriteString 读取/写入这些字符串),描述了字符串后面包含的字符数,仅具有正值。

虽然您可以查找原始.NET源,我在 二进制数据库

Basically, the idea behind a 7-bit encoded Int32 is to reduce the number of bytes required for small values. It works like this:

  1. The first 7 least significant bits of the original value are taken.
  2. If this value exceeds what can fit into these 7 bits, the 8th bit is set to 1, indicating another byte has to be read. Otherwise that bit is 0 and reading ends here.
  3. The next byte is read, its value shifted left by 7 bits and ORed to the previously read value to combine them together. Again, the 8th bit of this byte indicates if another byte must be read (shifting the read value further 7 more times).
  4. This continues until a maximum of 5 bytes has been read (because even Int32.MaxValue would not require more than 5 bytes when only 1 bit is stolen from each byte). If the highest bit of the 5th byte is still set, you've read something that isn't a 7-bit encoded Int32.

Note that since it is written byte-by-byte, endianness doesn't matter at all for these values. The following number of bytes are required for a given range of values:

  • 1 byte: 0 to 127
  • 2 bytes: 128 to 16,383
  • 3 bytes: 16,384 to 2,097,151
  • 4 bytes: 2,097,152 to 268,435,455
  • 5 bytes: 268,435,456 to 2,147,483,647 (Int32.MaxValue) and -2,147,483,648 (Int32.MinValue) to -1

As you can see, the implementation is kinda dumb and always requires 5 bytes for negative values as the sign bit is the 32nd bit of the original value, always ending up in the 5th byte.

Thus, I do not recommend it for negative values or values bigger than ~250,000,000. I've only seen it used internally for the string length prefix of .NET strings (those you can read/write with BinaryReader.ReadString and BinaryReader.WriteString), describing the number of characters following of which the string consists, only having positive values.

While you can look up the original .NET source, I use different implementations in my BinaryData library.

£冰雨忧蓝° 2024-08-14 14:34:55

我还必须探索这种 7 位格式。在我的一个项目中,我使用 C# 的 BinaryWriter 将一些数据打包到文件中,然后使用 BinaryReader 再次将其解压,效果很好。

后来我还需要为这个项目的 Java 打包文件实现一个阅读器。 Java有一个名为DataInputStream的类(在java.io包中),它有一些类似的方法。不幸的是,DataInputStream 的数据解释与 C# 的非常不同。

为了解决我的问题,我自己编写了一个扩展 java.io.DataInputStream 的类,将 C# 的 BinaryReader 移植到了 Java。这是我写的方法,它的作用与 C# 的 BinaryReader.readString() 完全相同:

public String csReadString() throws IOException {
    int stringLength = 0;
    boolean stringLengthParsed = false;
    int step = 0;
    while(!stringLengthParsed) {
        byte part = csReadByte();
        stringLengthParsed = (((int)part >> 7) == 0);
        int partCutter = part & 127;
        part = (byte)partCutter;
        int toAdd = (int)part << (step*7);
        stringLength += toAdd;
        step++;
    }
    char[] chars = new char[stringLength];
    for(int i = 0; i < stringLength; i++) {
        chars[i] = csReadChar();
    }
    return new String(chars);
}

I had to explore this 7-bit format also. In one of my projects I pack some data into files using C#'s BinaryWriter and then unpack it again with BinaryReader, which works nicely.

Later I needed to implement a reader for this project's packed files for Java, too. Java has a class named DataInputStream (in java.io package), which has some similar methods. Unfortunately DataInputStream's data interpretation is very different than C#'s.

To solve my problem I ported C#'s BinaryReader to Java myself by writing a class that extends java.io.DataInputStream. Here is the method I wrote, which does exactly the same as C#'s BinaryReader.readString():

public String csReadString() throws IOException {
    int stringLength = 0;
    boolean stringLengthParsed = false;
    int step = 0;
    while(!stringLengthParsed) {
        byte part = csReadByte();
        stringLengthParsed = (((int)part >> 7) == 0);
        int partCutter = part & 127;
        part = (byte)partCutter;
        int toAdd = (int)part << (step*7);
        stringLength += toAdd;
        step++;
    }
    char[] chars = new char[stringLength];
    for(int i = 0; i < stringLength; i++) {
        chars[i] = csReadChar();
    }
    return new String(chars);
}
鹿港巷口少年归 2024-08-14 14:34:55
/*
 * Parameters:  plOutput[out] - The decoded integer
 *              pbyInput[in]  - Buffer containing encoded integer
 * Returns:     Number of bytes used to encode the integer
 */
int SevenBitEncodingToInteger(int *plOutput, char *pbyInput)
{
    int lSize = 0;
    int lTemp = 0;
    while(true)
    {
        lTemp += pbyInput[lSize] & 0x7F;
        if(pbyInput[lSize++] > 127)
            lTemp <<= 7;
        else
            break;
    }
    *plOutput = lTemp;
    return lSize;
}
/*
 * Parameters:  plOutput[out] - The decoded integer
 *              pbyInput[in]  - Buffer containing encoded integer
 * Returns:     Number of bytes used to encode the integer
 */
int SevenBitEncodingToInteger(int *plOutput, char *pbyInput)
{
    int lSize = 0;
    int lTemp = 0;
    while(true)
    {
        lTemp += pbyInput[lSize] & 0x7F;
        if(pbyInput[lSize++] > 127)
            lTemp <<= 7;
        else
            break;
    }
    *plOutput = lTemp;
    return lSize;
}
極樂鬼 2024-08-14 14:34:55

Write7BitEncodedInt 方法包含描述:每个字节的最低 7 位对数字的下 7 位进行编码。当后面有另一个字节时,最高位被设置。

Write7BitEncodedInt method contains the description: The lowest 7 bits of each byte encode the next 7 bits of the number. The highest bit is set when there's another byte following.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文