如何有效地从字节中读取位?
我正在开发一个包含 WebSockets 的项目,服务器 (Node.js) 和客户端 (Chrome) 之间的数据是使用我设置的用于数据交换的自定义(非常简单)格式发送的。
我以 3 位为单位发送数据,因为我发送的项目都有 8 种可能性。数据格式如下所示:
0 1
bit index 01234567 8901...
item aaabbbcc cddd...
目前,我正在从字节中解析项目,如下所示:
var itemA = bytes[0] >> 5;
var itemB = (bytes[0] >> 2) & 7;
var itemC = (bytes[0] & 3) << 1 | bytes[1] >> 7;
var itemD = (bytes[1] >> 4) & 7;
就个人而言,这感觉太复杂了。问题是它很复杂,因为我以字节为单位获取数据,这是 8 的倍数。为了解析 3 位的项目,我必须进行位移位,执行 AND 运算,并且因为 8 不能被 3 整除有时甚至必须组合两个字节的部分,例如 itemC
。
以 3 位组而不是 8 位组的形式读取该数据会更有效。
我想出的是使用 .toString(2)
将所有字节转换为位到字符串,然后使用 .substring
获取长度为 3 的子字符串,然后使用 parseInt(bitString, 2) 转换回数字,但我想这不是这样做的方法,因为字符串操作很慢而且我实际上没有做任何与字符串相关的事情。
是否可以读取 3 个一组的位,而不是从字节中解析它们?或者是否有更有效的方法从字节中读取位?
I'm working on a project that includes WebSockets, and data between the server (Node.js) and the client (Chrome) is sent using a custom (very simple) format for data exchange I set up.
I'm sending data in pieces of 3 bits because I'm sending items which all have 8 possibilities. The data format looks like this:
0 1
bit index 01234567 8901...
item aaabbbcc cddd...
Currently, I'm parsing the items out of the bytes like this:
var itemA = bytes[0] >> 5;
var itemB = (bytes[0] >> 2) & 7;
var itemC = (bytes[0] & 3) << 1 | bytes[1] >> 7;
var itemD = (bytes[1] >> 4) & 7;
Personally, this feels as being too sophisticated. The problem is that it's only complex because I'm getting data in bytes, which are a multiple of 8. To parse out items of 3 bits I have to bit-shift, doing AND operations, and because 8 is not divisible by 3 I sometimes even have to combine parts of two bytes like for itemC
.
It would be much more effective to read this data as groups of 3 bits instead of groups of 8 bits.
What I've come up with is converting all bytes into bits to a string using .toString(2)
, then using .substring
to get a substring with length 3, and converting back to a number with parseInt(bitString, 2)
, but I guess that's not the way to do it, since string manipulation is slow and I'm actually not doing anything string-related.
Is it possible to read bits in groups of e.g. 3 instead of parsing them from bytes? Or is there a more efficient way to read bits out of bytes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
二进制 AND 和位移操作是执行此操作的最快方法。它们可以很好地翻译成机器代码指令。进一步加快速度的唯一方法是通过牺牲带宽来提高速度,例如,每个字节不使用超过 3 位,但从您的问题来看,您可能已经考虑并拒绝了这种权衡。
The binary AND and bit shifting operations are the fastest way of doing this. They translate well into machine code instructions. The only way to further speed this up is by sacrificing bandwidth for speed by for example simply not using more than 3 bits per byte, but judging from your question you've probably already considered and rejected that tradeoff.
如果注意字节顺序,您可以将其作为 int 或 long int 数组访问。
还有一种不使用位 3 和位 7 的可能性
if endian-ness is taken care, you can access it as an int or a long int array.
There is another possibily of not using bit 3 and bit 7
我们可以通过获取适当的 16 位整数然后对其进行位移来获得我们需要的值。
很明显,要获得第 i 个值,我们应该获得 16 位整数,其偏移量以字节为单位,适合 (bits * (i + 1) - 16)/8 <=偏移量<=(位* i)/8。
让我们采用
M=bits*i/8
,所以我们有M + bits/8 - 2<= offset <= M
。然后我们得到最小偏移量 ceil(M + bits/8 - 2) 并通过偏移量计算第 i 个值在 16 位整数中的位置。我刚刚编写了以下函数和以下示例,以从缓冲区读取 5 个字节长度的 8 个 5 位值。
当我启动 Node test.js 时,我得到了
We can get value we need by getting appropriate 16-bits integer and then bitshift it.
It is clear, that to get
i-th
value we should get 16-bits integer with offset in bytes that fits(bits * (i + 1) - 16)/8 <= offset <= (bits * i)/8
.Lets take
M=bits*i/8
, so we haveM + bits/8 - 2<= offset <= M
. Then we get minimal offset asceil(M + bits/8 - 2)
and calculate position of i-th value in the 16-bit integer by offsets. I have just wrote the following functionAnd the following example to read 8 5-bit values off the Buffer with 5 bytes length.
When I launch node test.js I got
如果你有多个固定长度(即你总是可以保证它是 2 个字节),你可以像这样读取这些位:
If you have a number of fixed length (ie you can always guarantee it'll be 2 bytes), you can read the bits like this:
我知道的最短的方法是:
假设
value
是一个“big-endian”字节(数字的左侧先于右侧写入内存)。如果不是,您需要先将其转换。此函数还应适用于任何位大小的数字(16、32、64...)。
运作原理:
因为从最右边开始,每个位代表一个等效的“十进制”值,等于 2(代表二进制系统中的 2 个符号)的位位置次方(从 1 开始,不包括第一个为 1 的位)-如果设置了该位,则将其与原始值进行“与”操作将始终返回一个非零值 - Math.pow 的结果(也是真值),如果不是,则返回 0(假值)。
The shortest way I know is this:
This is assuming
value
is a "big-endian" byte (the left side of the number is written in memory before the right side). If it isn't, you'll need to convert it first.This function should also work for any bit-size number (16, 32, 64...).
HOW IT WORKS:
Because each bit, starting from the right most, represents an equivalent "decimal" value equals to 2 (representing 2 symbols in the Binary system) to the power of the bit position (1-based, excluding the first which is just 1) - if that bit is set, ANDing it with the original value will always return a non-zero value - the result of Math.pow (which is also truthy), and 0 (falsy) if it is not.