确定存储用户输入差异所需的最小字段大小的有效方法
抱歉,标题很笨拙;我找不到表达我想做的事情的方式。
我从用户那里得到多个 32 位整数的输入。例如,用户可以输入以下值(为了便于解释,以十六进制显示):
0x00001234
0x00005678
0x0000abcd
在这种特殊情况下,每个输入的前 2 个字节是常量,最后 2 个字节是可变的。出于效率目的,我可以将 0x0000
存储为单个常量,并创建一个 uint16_t
值向量来存储输入的变量部分 (0x1234,
0x5678
,0xabcd
)。
现在假设用户输入以下内容:
0x00000234
0x56780000
0x00001000
在这种情况下,我需要一个 uint32_t 值向量来存储输入的变量部分,因为每个值影响不同的字节。
我当前的想法是执行以下操作:
uint32_t myVal = 0;
myVal |= input1;
myVal |= input2;
// ...
然后最后找到 myVal 中第一个和最后一个“切换”(即 1
)位之间的距离。该距离将为我提供所有输入的可变部分所需的字段大小。
然而,这听起来并不能很好地适应大量用户输入。关于确定这一点的优雅而有效的方法有什么建议吗?
更新:
我在上面的解释中简化了问题。
需要明确的是,我这样做并不是为了节省内存(我有更好的事情要做,而不是尝试节省一些字节,这不是为了优化目的)。
总之,组件 A 为我的系统中的组件 B 提供了值。有时这些值是 128 位,但组件 B 仅支持 32 位值。
如果128位值的可变部分可以用32位值来表示,我可以接受。否则我将需要因错误而拒绝它。
我无法修改组件 B 以允许 128 位值,或修改组件 A 以防止其使用 128 位值(这里也存在硬件限制)。
Sorry about the clumsy title; I couldn't find a bit way of expressing what I'm trying to do.
I am getting an input from the user of multiple 32-bit integers. For example, the user may enter the following values (showing in hex for ease of explanation):
0x00001234
0x00005678
0x0000abcd
In this particular case, the first 2 bytes of each input is constant, and the last 2 bytes are variable. For efficiency purposes, I could store 0x0000
as a single constant, and create a vector of uint16_t
values to store the variable portion of the input (0x1234
, 0x5678
, 0xabcd
).
Now let's say the user enters the following:
0x00000234
0x56780000
0x00001000
In this case I would need a vector of uint32_t
values to store the variable portion of the input as each value affects different bytes.
My current thought is to do the following:
uint32_t myVal = 0;
myVal |= input1;
myVal |= input2;
// ...
And then at the end find the distance between the first and last "toggled" (i.e. 1
) bit in myVal
. The distance will give me required field size for the variable portion of all of the inputs.
However, this doesn't sound like it would scale well for a large number of user inputs. Any recommendations about an elegant and efficient way of determining this?
Update:
I simplified the problem in my above explanation.
Just to be clear, I am not doing this to save memory (I have better things to do than to try and conserve a few bytes and this isn't for optimization purposes).
In summary, component A provides component B in my system with values. Sometimes these values are 128-bit, but component B only supports 32-bit values.
If the variable portion of the 128-bit value can be expressed with a 32-bit value, I can accept it. Otherwise I will need to reject it with an error.
I'm not in a position to modify component B to allow 128-bit values, or modify component A to prevent its use of 128-bit values (there are hardware limitations here too).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用
如果至少一个输入在该位置有
1
,则ORVal
中的位位置将为1
;如果该位置为0
,则该位位置将为1
所有输入在该位置都有0
。如果至少一个输入在该位位置中具有0
,则ANDVal
中的位位置将为0
;如果所有输入在该位置都有1
。如果输入中的位位置始终为
1
,则ORVal
和ANDVal
都将设置为1
。如果输入中的位位置始终为
0
,则ORVal
和ANDVal
都将设置为0
。如果位位置中混合有
0
和1
,则ORVal
将设置为1
和 < code>ANDVal 设置为0
,因此最后的XOR
给出了已更改的位位置的掩码。Use
A bit position in
ORVal
will be1
if at least one input had1
in that position and0
if ALL inputs had0
in that position. A bit position inANDVal
will be0
if at least one input had0
in that bit position and1
if ALL inputs had1
in that position.If a bit position in inputs was always
1
, thenORVal
andANDVal
will both be set to1
.If a bit position in inputs was always
0
, thenORVal
andANDVal
will both be set to0
.If there was a mix of
0
and1
in a bit position thenORVal
will be set to1
andANDVal
set to0
, hence theXOR
at the end gives the mask for bit positions that changed.看起来您必须想出一个累积位掩码 - 然后您可以查看它以查看是否有尾随或前导常量位。需要一个对每个输入进行操作的算法(使其成为 O(n) 算法,其中 n 是要检查的值的数量)。
该算法类似于您已经完成的操作:
然后您可以检查前导/尾随有多少位/字节可以保持不变,以及是否要使用完整的 32 位。如果您有权访问 SSE 指令,则可以使用 OpenMP 对其进行矢量化。
还有一种可能的优化方法是通过短路来查看第一个
1
位和最后一个1
位之间的距离是否已经大于 32,在这种情况下您可以停止。为了使该算法更好地扩展,您必须并行执行。您的朋友可能会进行矢量处理(可能会使用 Nvidia GPU 的 CUDA,或者如果您使用的是 Mac 或已经支持 OpenCL 的平台,或者仅支持 OpenMP 注释,则使用 OpenCL)。
It looks like you have to come up with a cumulative bitmask -- which you can then look at to see whether you have trailing or leading constant bits. An algorithm that operates on each input will be required (making it an O(n) algorithm, where n is the number of values to inspect).
The algorithm would be similar to something like what you've already done:
You can then check to see how many bits/bytes leading/trailing can be made constant, and whether you're going to use the full 32 bits. If you have access to SSE instructions, you can vectorize this using OpenMP.
There's also a possible optimization by short-circuiting to see if the distance between the first
1
bit and the last1
bit is already greater than 32, in which case you can stop.For this algorithm to scale better, you're going to have to do it in parallel. Your friend would be vector processing (maybe using CUDA for Nvidia GPUs, or OpenCL if you're on the Mac or on platforms that already support OpenCL, or just OpenMP annotations).
存储
[0, (2^32)-1]
范围内的一系列无符号整数的最有效方法是使用uint32_t
。为了节省用户输入的 2 个字节而费尽周折是不值得您花时间的——用户在他的一生中不可能输入足够的整数,以至于您的代码必须开始压缩它们。早在任何现代系统上内存限制变得明显之前,他或她就会死于老年。The most efficient way to store a series of unsigned integers in the range
[0, (2^32)-1]
is by just usinguint32_t
. Jumping through hoops to save 2 bytes from user input is not worth your time--the user cannot possibly, in his lifetime, enter enough integers that your code would have to start compressing them. He or she would die of old age long before memory constraints became apparent on any modern system.存储您遇到的第一个完整的 128 位数字,然后将其低位 32 位推入向量,设置 boolreject_all = false。对于每个剩余的数字,如果高阶 (128-32=96) 位与第一个数字不同,则设置
reject_all = true
,否则将其低阶位推送到向量上。在循环结束时,使用reject_all来决定是否使用值向量。Store the first full 128 bit number you encounter, then push the lower order 32 bits of it onto a vector, set
bool reject_all = false
. For each remaining number, if high-order (128-32=96) bits differ from the first number's then setreject_all = true
, otherwise push their lower-order bits on the vector. At the end of the loop, use reject_all to decide whether to use the vector of values.虽然我看不出这一切的原因...为什么不将输入与 std::numeric_limits::max() 进行比较?如果输入给出更大的值,那么您需要使用
uint32_t
。回答您的编辑:
我想为了获得更好的性能,您应该使用特定于硬件的低级指令。您可以迭代输入 128 位值的 32 位部分,然后将每个部分添加到某个变量,并检查下一个值与当前总和之间的差异。如果差值不等于总和,那么您应该跳过这个 128 位值,否则您最终将得到必要的结果。示例如下:
在这个 C++ 示例中,它可能看起来很愚蠢,但我相信在汇编代码中这应该有效地处理输入流。
Though I can't see a reason for all that... Why just not to compare an input with the
std::numeric_limits<uint16_t>::max()
? If the input gives a larger value then you need to useuint32_t
.Answering your edit:
I suppose for for better performance you should use hardware specific low level instructions. You could iterate over 32-bit parts of the input 128-bit value and subsequently add each one to the some variable and check the difference between next value and current sum. If the difference isn't equal to the sum then you should skip this 128-bit value, otherwise you'll get the necessary result in the end. The sample follows:
In this C++ example it may be looks silly but I believe in the assembly code this should efficiently process the input stream.