HPACK编码整数意义
读完本文后,https://httpwg.org/specs/rfc7541.html#integer。代表 尽管我似乎了解了这个想法的总体要点,但我对很多事情感到困惑。 其一,“前缀”到底是什么/它们的目的是什么?
对于两个:
C.1.1. Example 1: Encoding 10 Using a 5-Bit Prefix
The value 10 is to be encoded with a 5-bit prefix.
10 is less than 31 (2^5 - 1) and is represented using the 5-bit prefix.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 | 10 stored on 5 bits
+---+---+---+---+---+---+---+---+
前导 X 是什么?开头的0是做什么用的?
>>> bin(10)
'0b1010'
>>>
在 python IDE 中输入此内容,您会看到几乎相同的输出...为什么不同? 不过,这是当数字适合前缀位数时,使得它看起来很简单。
C.1.2. Example 2: Encoding 1337 Using a 5-Bit Prefix
The value I=1337 is to be encoded with a 5-bit prefix.
1337 is greater than 31 (25 - 1).
The 5-bit prefix is filled with its max value (31).
I = 1337 - (25 - 1) = 1306.
I (1306) is greater than or equal to 128, so the while loop body executes:
I % 128 == 26
26 + 128 == 154
154 is encoded in 8 bits as: 10011010
I is set to 10 (1306 / 128 == 10)
I is no longer greater than or equal to 128, so the while loop terminates.
I, now 10, is encoded in 8 bits as: 00001010.
The process ends.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 | Prefix = 31, I = 1306
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1306>=128, encode(154), I=1306/128
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 10<128, encode(10), done
+---+---+---+---+---+---+---+---+
类似八位位组的图显示了正在生成的三个不同的数字...由于数字是在整个循环中生成的,因此如何在整数中复制这个类似八位位组的图?最终的实际结果又是怎样的呢?图表或“I”是 10,或 00001010。
def f(a, b):
if a < 2**b - 1:
print(a)
else:
c = 2**b - 1
remain = a - c
print(c)
if remain >= 128:
while 1:
e = remain % 128
g = e + 128
remain = remain / 128
if remain >= 128:
continue
else:
print(remain)
c+=int(remain)
print(c)
break
当我试图弄清楚这一点时,我编写了一个快速的 python 实现,似乎我留下了一些无用
变量,其中一个是g
在文档中是 26 + 128 == 154。
最后,128从哪里来?除了 2 的 7 次方是 128 之外,我找不到数字之间的任何关系,但为什么这么重要呢?这是因为第一位被保留作为连续标志吗?一个八位字节包含 8 位,所以 8 - 1 = 7?
After reading this, https://httpwg.org/specs/rfc7541.html#integer.representation
I am confused about quite a few things, although I seem to have the overall gist of the idea.
For one, What are the 'prefixes' exactly/what is their purpose?
For two:
C.1.1. Example 1: Encoding 10 Using a 5-Bit Prefix
The value 10 is to be encoded with a 5-bit prefix.
10 is less than 31 (2^5 - 1) and is represented using the 5-bit prefix.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 0 | 1 | 0 | 1 | 0 | 10 stored on 5 bits
+---+---+---+---+---+---+---+---+
What are the leading Xs? What is the starting 0 for?
>>> bin(10)
'0b1010'
>>>
Typing this in the python IDE, you see almost the same output... Why does it differ?
This is when the number fits within the number of prefix bits though, making it seemingly simple.
C.1.2. Example 2: Encoding 1337 Using a 5-Bit Prefix
The value I=1337 is to be encoded with a 5-bit prefix.
1337 is greater than 31 (25 - 1).
The 5-bit prefix is filled with its max value (31).
I = 1337 - (25 - 1) = 1306.
I (1306) is greater than or equal to 128, so the while loop body executes:
I % 128 == 26
26 + 128 == 154
154 is encoded in 8 bits as: 10011010
I is set to 10 (1306 / 128 == 10)
I is no longer greater than or equal to 128, so the while loop terminates.
I, now 10, is encoded in 8 bits as: 00001010.
The process ends.
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| X | X | X | 1 | 1 | 1 | 1 | 1 | Prefix = 31, I = 1306
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1306>=128, encode(154), I=1306/128
| 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 10<128, encode(10), done
+---+---+---+---+---+---+---+---+
The octet-like diagram shows three different numbers being produced... Since the numbers are produced throughout the loop, how do you replicate this octet-like diagram within an integer? What is the actual final result? The diagram or "I" being 10, or 00001010.
def f(a, b):
if a < 2**b - 1:
print(a)
else:
c = 2**b - 1
remain = a - c
print(c)
if remain >= 128:
while 1:
e = remain % 128
g = e + 128
remain = remain / 128
if remain >= 128:
continue
else:
print(remain)
c+=int(remain)
print(c)
break
As im trying to figure this out, I wrote a quick python implementation of it, It seems that i am left with a few useless
variables, one being g
which in the documentation is the 26 + 128 == 154.
Lastly, where does 128 come from? I can't find any relation between the numbers besides the fact 2 raised to the 7th power is 128, but why is that significant? Is this because the first bit is reserved as a continuation flag? and an octet contains 8 bits so 8 - 1 = 7?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
整数在 HPACK 消息中的几个地方使用,并且通常它们具有不能用于实际整数的前导位。因此,通常会有一些前导数字无法用于整数本身。它们由 X 表示。出于此计算的目的,它并不表示这些 X 是什么:可能是 000、或 111、或 010 或...等。另外,并不总是有 3 个 X——这只是一个例子。只能有一个前导 X、或两个、或四个……等等。
例如,要查找以前的 HPACK 解码标头,我们使用 6.1。索引标头字段表示,以前导 1 开头,后跟表索引值。因此 1 就是前面示例中的 X。我们有 7 位(而不是您问题的原始示例中只有 5 位)。如果表索引值是 127 或更小,我们可以使用这 7 位来表示它。如果它 >= 127 那么我们需要做一些额外的工作(我们稍后会讨论这一点)。
如果它是我们想要添加到表中的新值(以便在将来的请求中重用),但表中已经有该标头名称(因此它只是我们想要作为新条目的该名称的新值),那么我们使用6.2.1。具有增量索引的文字标头字段。开头有 2 位(
01
- 即 X),这次我们只有 6 位来表示我们要重用的名称的索引。所以在这种情况下有两个 X。所以不用担心有 3 个 X——这只是一个例子。在上面的示例中,分别有一个 X(因为第一位必须为 1)和两个 X(因为前两位必须为 01)。 整数表示部分告诉您如何处理任何带前缀的整数,无论是否有前缀由 1、2、3...等不可用的“X”位组成。
前面讨论了前导 X。开头的 0 只是因为,在这个例子中我们有 5 位来表示整数,而只需要 4 位。因此我们用 0 填充它。如果要编码的值为 20,则它将是
10100
。如果该值为 40,我们无法将其放入 5 位中,因此需要执行其他操作。Python 使用
0b
来表示它是一个二进制数。它不会显示任何前导零。因此,0b1010
与0b01010
相同,也与0b00001010
相同。确切地。如果您需要的位数超出了您拥有的位数,则没有足够的空间。你不能只使用更多的位,因为 HPACK 不知道你是否打算使用更多的位(所以应该查看下一个字节)或者它是否只是一个直数(所以只查看这一个字节)。它需要一个信号才能知道这一点。该信号使用全 1。
因此,要将 40 编码为 5 位,我们需要使用
11111
来表示“它不够大”,溢出到下一个字节。二进制中的11111
是 31,所以我们知道它比这个大,所以我们不会浪费它,而是使用它,并从 40 中减去它,得到剩下的 9 来编码下一个字节。一个新的附加字节为我们提供了 8 个新位(我们很快就会发现,实际上只有 7 个位,因为第一个位用于表示进一步的溢出)。这已经足够了,我们可以使用00001001
来编码 9。因此我们的复数由两个字节表示:XXX11111
和00001001
。如果我们想要编码的值大于第一个前缀位中可以修复的值,并且剩余的值大于 127,可以容纳第二个字节的可用 7 位,那么我们就不能使用这种使用两个字节的溢出机制。相反,我们使用另一种使用三个字节的“溢出,溢出”机制:
对于这种“溢出,溢出”机制,我们像平常一样将第一个字节位设置为 1 来表示溢出(
XXX11111
),然后设置第二个字节的第一位为 1。这留下了 7 位可用于对值进行编码,加上我们将不得不使用的第三个字节中的接下来的 8 位(实际上只有第三个字节的 7 位,因为它再次使用第一点以指示另一个溢出)。他们可以通过多种方式使用第二个和第三个字节来解决这个问题。他们决定将其编码为两个数字:128 mod 和 128 乘数。
因此,这意味着按照前面的示例,第一个字节设置为 31,第二个字节设置为 26(即
11010
)加上前导 1 以表明我们正在使用溢出溢出方法(因此100011010
),第三个字节设置为 10(或00001010
)。因此 1337 被编码为三个字节:
XXX11111 100011010 00001010
(包括将 X 设置为这些值)。使用 128 mod 和乘法器非常高效,意味着这个大数字(实际上是 16,383 以内的任何数字)可以用三个字节表示,这也是可以用 7 + 7 = 14 位表示的最大整数,这并非偶然。 )。但这确实需要一些头脑!
如果它大于 16,383,那么我们需要以类似的方式进行另一轮溢出。
所有这些看起来极其复杂,但实际上编码起来相对简单且有效。计算机可以非常轻松且快速地完成此操作。
你没有在
if
语句中打印这个值。仅else
中剩余的值。您需要打印两者。确切地说,这是因为需要按照上面的解释设置第一位(值 128),以表明我们正在继续/溢出到需要第三个字节。
Integers are used in a few places in HPACK messages and often they have leading bits that cannot be used to for the actual integer. Therefore, there will often be a few leading digits that will be unavailable to use for the integer itself. They are represented by the X. For the purposes of this calculation it doesn't make what those Xs are: could be 000, or 111, or 010 or...etc. Also, there will not always be 3 Xs - that is just an example. There could only be one leading X, or two, or four...etc.
For example, to look up a previous HPACK decoded header, we use 6.1. Indexed Header Field Representation which starts with a leading 1, followed by the table index value. Therefore that 1 is the X in the previous example. We have 7-bits (instead of only 5-bits in the original example in your question). If the table index value is 127 or less we can represent it using those 7-bits. If it's >= 127 then we need to do some extra work (we'll come back to this).
If it's a new value we want to add to the table (to reuse in future requests), but we already have that header name in the table (so it's just a new value for that name we want as a new entry) then we use 6.2.1. Literal Header Field with Incremental Indexing. This has 2 bits at the beginning (
01
- which are the Xs), and we only have 6-bits this time to represent the index of the name we want to reuse. So in this case there are two Xs.So don't worry about there being 3 Xs - that's just an example. In the above examples there was one X (as first bit had to be 1), and two Xs (as first two bits had to be 01) respectively. The Integer Representation section is telling you how to handle any prefixed integer, whether prefixed by 1, 2, 3... etc unusable "X" bits.
The leading Xs are discussed above. The starting 0 is just because, in this example we have 5-bits to represent the integers and only need 4-bits. So we pad it with 0. If the value to encode was 20 it would be
10100
. If the value was 40, we couldn't fit it in 5-bits so need to do something else.Python uses
0b
to show it's a binary number. It doesn't bother showing any leading zeros. So0b1010
is the same as0b01010
and also the same as0b00001010
.Exactly. If you need more than the number of bits you have, you don't have space for it. You can't just use more bits as HPACK will not know whether you are intending to use more bits (so should look at next byte) or if it's just a straight number (so only look at this one byte). It needs a signal to know that. That signal is using all 1s.
So to encode 40 in 5 bits, we need to use
11111
to say "it's not big enough", overflow to next byte.11111
in binary is 31, so we know it's bigger than that, so we'll not waste that, and instead use it, and subtract it from the 40 to give 9 left to encode in the next byte. A new additional byte gives us 8 new bits to play with (well actually only 7 as we'll soon discover, as the first bit is used to signal a further overflow). This is enough so we can use00001001
to encode our 9. So our complex number is represented in two bytes:XXX11111
and00001001
.If we want to encode a value bigger than can fix in the first prefixed bit, AND the left over is bigger than 127 that would fit into the available 7 bits of the second byte, then we can't use this overflow mechanism using two bytes. Instead we use another "overflow, overflow" mechanism using three bytes:
For this "overflow, overflow" mechanism, we set the first byte bits to 1s as usual for an overflow (
XXX11111
) and then set the first bit of the second byte to 1. This leaves 7 bits available to encode the value, plus the next 8 bits in the third byte we're going to have to use (actually only 7 bits of the third byte, because again it uses the first bit to indicate another overflow).There's various ways they could go have gone about this using the second and third bytes. What they decided to do was encode this as two numbers: the 128 mod, and the 128 multiplier.
So that means the frist byte is set to 31 as per pervious example, the second byte is set to 26 (which is
11010
) plus the leading 1 to show we're using the overflow overflow method (so100011010
), and the third byte is set to 10 (or00001010
).So 1337 is encoded in three bytes:
XXX11111 100011010 00001010
(including setting X to whatever those values were).Using 128 mod and multiplier is quite efficient and means this large number (and in fact any number up to 16,383) can be represented in three bytes which is, not uncoincidentally, also the max integer that can be represented in 7 + 7 = 14 bits). But it does take a bit of getting your head around!
If it's bigger than 16,383 then we need to do another round of overflow in a similar manner.
All this seems horrendously complex but is actually relatively simply, and efficiently, coded up. Computers can do this pretty easily and quickly.
You are not print this value in the
if
statement. Only the left over value in theelse
. You need to print both.Exactly, it's because the first bit (value 128) needs to be set as per explanation above, to show we are continuing/overflowing into needing a third byte.