是否可以使用整数算术实现按位运算符?
我面临着一个相当特殊的问题。我正在为不支持按位运算的体系结构开发编译器。但是,它处理带符号的 16 位整数算术,我想知道是否可以仅使用以下内容来实现按位运算:
- 加法 (c = a + b)
- 减法 (c = a - b)
- 除法 (c = a / b)
- 乘法 (c = a * b)
- 模数 (c = a % b)
- 最小值 (c = min(a, b))
- 最大值 (c = max(a, b))
- 比较 (c = (a < b)、c = (a == b)、c = (a <= b) 等)
- 跳跃 (goto、for 等)
我希望能够支持的按位运算是:
- Or (c = a | b )
- 与 (c = a & b)
- 异或 (c = a ^ b)
- 左Shift (c = a << b)
- 右移 (c = a >> b)
- (全部整数是有符号的,所以这是一个问题)
- 有符号移位(c = a>>>>> b)
- 补码 (a = ~b)
- (已经找到解决方案,见下文)
通常问题是相反的;如何使用按位黑客实现算术优化。但在本例中并非如此。
在该架构上可写内存非常稀缺,因此需要按位运算。按位函数本身不应使用大量临时变量。然而,恒定的只读数据和指令内存充足。这里还需要注意的是,跳转和分支并不昂贵,并且所有数据都可以轻松缓存。跳转花费的周期是算术(包括加载/存储)指令的一半。换句话说,上述所有支持的功能的成本是单次跳转周期的两倍。
一些可能有帮助的想法:
我发现你可以使用以下代码进行补码(求反位):
// Bitwise one's complement
b = ~a;
// Arithmetic one's complement
b = -1 - a;
我还记得除以二的幂时的旧移位技巧,因此按位移位可以表示为:
// Bitwise left shift
b = a << 4;
// Arithmetic left shift
b = a * 16; // 2^4 = 16
// Signed right shift
b = a >>> 4;
// Arithmetic right shift
b = a / 16;
对于其余的按位运算我有点无能为力。我希望该架构的架构师能够提供位操作。
我还想知道是否有一种快速/简单的方法可以在不使用内存数据表的情况下计算二的幂(用于移位操作)。一个天真的解决方案是跳入乘法领域:
b = 1;
switch (a)
{
case 15: b = b * 2;
case 14: b = b * 2;
// ... exploting fallthrough (instruction memory is magnitudes larger)
case 2: b = b * 2;
case 1: b = b * 2;
}
或者集合和乘法。跳跃方法:
switch (a)
{
case 15: b = 32768; break;
case 14: b = 16384; break;
// ... exploiting the fact that a jump is faster than one additional mul
// at the cost of doubling the instruction memory footprint.
case 2: b = 4; break;
case 1: b = 2; break;
}
I am facing a rather peculiar problem. I am working on a compiler for an architecture that doesn't support bitwise operations. However, it handles signed 16-bit integer arithmetics and I was wondering if it would be possible to implement bitwise operations using only:
- Addition (c = a + b)
- Subtraction (c = a - b)
- Division (c = a / b)
- Multiplication (c = a * b)
- Modulus (c = a % b)
- Minimum (c = min(a, b))
- Maximum (c = max(a, b))
- Comparisons (c = (a < b), c = (a == b), c = (a <= b), et.c.)
- Jumps (goto, for, et.c.)
The bitwise operations I want to be able to support are:
- Or (c = a | b)
- And (c = a & b)
- Xor (c = a ^ b)
- Left Shift (c = a << b)
- Right Shift (c = a >> b)
- (All integers are signed so this is a problem)
- Signed Shift (c = a >>> b)
- One's Complement (a = ~b)
- (Already found a solution, see below)
Normally the problem is the other way around; how to achieve arithmetic optimizations using bitwise hacks. However not in this case.
Writable memory is very scarce on this architecture, hence the need for bitwise operations. The bitwise functions themselves should not use a lot of temporary variables. However, constant read-only data & instruction memory is abundant. A side note here also is that jumps and branches are not expensive and all data is readily cached. Jumps cost half the cycles as arithmetic (including load/store) instructions do. On other words, all of the above supported functions cost twice the cycles of a single jump.
Some thoughts that might help:
I figured out that you can do one's complement (negate bits) with the following code:
// Bitwise one's complement
b = ~a;
// Arithmetic one's complement
b = -1 - a;
I also remember the old shift hack when dividing with a power of two so the bitwise shift can be expressed as:
// Bitwise left shift
b = a << 4;
// Arithmetic left shift
b = a * 16; // 2^4 = 16
// Signed right shift
b = a >>> 4;
// Arithmetic right shift
b = a / 16;
For the rest of the bitwise operations I am slightly clueless. I wish the architects of this architecture would have supplied bit-operations.
I would also like to know if there is a fast/easy way of computing the power of two (for shift operations) without using a memory data table. A naive solution would be to jump into a field of multiplications:
b = 1;
switch (a)
{
case 15: b = b * 2;
case 14: b = b * 2;
// ... exploting fallthrough (instruction memory is magnitudes larger)
case 2: b = b * 2;
case 1: b = b * 2;
}
Or a Set & Jump approach:
switch (a)
{
case 15: b = 32768; break;
case 14: b = 16384; break;
// ... exploiting the fact that a jump is faster than one additional mul
// at the cost of doubling the instruction memory footprint.
case 2: b = 4; break;
case 1: b = 2; break;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
第一个移位解决方案(shift 是移位距离,不能为负,a 是要移位的操作数,也包含完成后的结果)。所有三个班次操作均使用功率表。
对于 AND、OR 和 XOR,我无法想出一个简单的解决方案,因此我将通过循环每个位来实现。可能有更好的技巧来做到这一点。伪代码假设 a 和 b 是输入操作数,c 是结果值,x 是循环计数器(每个循环必须恰好运行 16 次):
假设所有变量都是 16 位并且所有操作的行为都是有符号的(因此实际上 a<0当设置位 15 时为真)。
编辑:我实际上测试了所有可能的操作数值(-32768 到 32767)的移位范围从 0 到 31 的正确性,并且它工作正常(假设整数除法)。对于 AND/OR/XOR 代码,在我的机器上进行详尽的测试需要太长时间,但由于这些代码非常简单,所以无论如何都不应该出现边缘情况。
First solutions for shifting (shift is the shift distance, must not be negative, a is the operand to be shifted and contains also the result when done). The power table is used by all three shift operations.
For AND, OR and XOR i could not come up with a simple solution, so i'll do it with looping over each single bit. There might be a better trick to do this. Pseudocode assumes a and b are input operands, c is the result value, x is the loop counter (each loop must run exactly 16 times):
Thats assuming that all variables are 16 bits and all operations behave as signed (so a<0 actually is true when bit 15 is set).
EDIT: i actually tested all possible operand values (-32768 to 32767) for shifts ranging from 0 to 31 for correctness and it works correctly (assuming integer divides). For the AND/OR/XOR code an exhaustive test takes too long on my machine, but since the code for these is pretty simple there should be no edge cases anyway.
在这种环境中,如果您可以设置为实际使用算术运算符来剥离整数的组成部分,那可能是最好的。
EG
如果将 RHS 限制为 2 的恒定幂,这些运算符的变换就足够明显了。
剥离两位或四位也很容易做到。
In this environment it might be best if you could set up to actually use arithmatic operators to peel out components of integers.
E.G.
The transforms for these operators are obvious enough if you restrict RHS to a constant power of 2.
Peeling off two or four bits is also easy to do.
对一个老问题的不完整答案,这里集中于 AND、OR、XOR。一旦找到其中一个按位运算的解,就可以导出另外两个。有多种方法,一种如下面的测试程序所示(在gcc版本4.6.3(Ubuntu/Linaro 4.6.3-1ubuntu5)上编译)。
2018 年 12 月,我发现解决方案存在错误。下面评论的 XOR 之所以有效,是因为
a+b-2*AND(a,b)
中的中间结果被提升为int
,对于所有现代来说,它都大于 16 位编译器。An incomplete answer on an old question, here concentrating on AND, OR, XOR. Once a solution is found for one of these bitwise operations, the other two can be derived. There are several ways, one is shown in the following test program (compiled on gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)).
In December 2018 I discovered an error in the solution. The XOR commented below only works because intermediate results in
a+b-2*AND(a,b)
are promoted toint
, which is larger than 16 bits for all modern compilers.您可以通过提取每一位来逐位操作(如马克·拜尔斯(Mark Byers)建议的那样),这会很慢。
或者,您可以加速流程并使用二维查找表来存储结果,例如两个 4 位操作数并对其进行操作。与对位进行操作相比,您需要的提取次数更少。
您还可以使用加法、减法和 >= 运算来完成所有操作。
每个按位运算都可以使用宏展开为类似这样的内容:
您需要 3 个变量来实现它。
每个按位运算都将围绕类似于 AND_MACRO 的宏 - 您将 a 和 b 的剩余值与“掩码”(即“c”参数)进行比较。然后在 if 分支的结果中添加适合您操作的掩码。如果设置了位,则可以从值中减去掩码。
根据您的平台,它可能比使用 % 和 / 提取每一位,然后使用乘法将其放回要快。
亲自看看哪个更适合您。
You can operate bit-by-bit (as Mark Byers suggested), by extracting every bit which will be slow.
Or you could accelerate process and use 2d lookup tables that store results, say, for two 4-bit operands and operate on those. You'll need less extractions than if you were operating on bits.
You can also do everything using addition, subtraction and >= operation.
Every bitwise operation can be unrolled into something like this using macros:
You'll need 3 variables to implement this.
Every bitwise operation will revolve around macros similar to
AND_MACRO
- you compare remaining values of a and b to the "mask" (which is "c" parameter). then add mask to the result in the if branch that is suitable for your operation. And you subtract mask from values, if bit is set.Depending on your platform, it may be faster than extracting every bit using % and / , and then putting it back using multiplication.
See for yourself whichever is better for you.
只要你愿意付出非常昂贵的代价,是的。
基本上,您将显式地将数字放入以 2 为基数的表示形式中。这样做就像将一个数字放入以 10 为底的数字中(例如,将其打印出来),即通过重复除法。
这会将您的数字转换为布尔数组(或 0,1 范围内的整数),然后添加函数来对这些数组进行操作。
再次强调,这并不是说这比按位运算要昂贵得多,而且几乎任何体系结构都会提供按位运算符。
在 C 中(当然,在 C 中你有按位运算符,但是......)一个实现可能是:
As long as you're willing for it to be very expensive, yes.
Basically, you'll explicitly put a number into a base-2 representation. You do this just as you would put a number into base-10 (e.g., to print it out), that is, by repeated division.
This turns your number into an array of bools (or ints in the range 0,1), then we add functions to operate on those arrays.
again, not that this is tremendously more expensive than bitwise operations, and that almost any architecture will supply bitwise operators.
In C (of course, in C you have bitwise operators, but...) an implementation might be:
只是一些其他方法
例如16位AND:
双解决方案2位AND没有循环或表查找:
32-位整数解2位与:
16位整数解2位与:
16位整数解决方案3位AND:
Just some other approaches
For example a 16-bit AND:
double solution 2-bit AND without loops or table lookups:
32-bit integer solution 2-bit AND:
16-bit integer solution 2-bit AND:
16-bit integer solution 3-bit AND:
这是我想出的一种使用 Double-64 整数加法并行处理按位 XOR 16 位的方法:
位字符串如下所示(为了清楚起见,我在此处取出了
3e15
保护数字):一个 52 位无符号整数加法,以及很少的字符串替换调用,并且输出已经处于可以传递到下游的状态。
此添加将攀升至的绝对最高值是 8222,2222,222,222,略低于 53 位硬限制。
对于按位 AND,将所有 1(前导 6 或 7)转换为 0:只有 2 和前导 8 是正确的位,然后应将其转换为 1。
对于按位或,则相反 - 任何不是 0 或 6 的东西在输出字符串中都是“1”。
对于按位补码,更简单 - 从 1,111,111,111,111,111 开始,减去 2 个字节的串联位串即可获得它。
here's a method i came up with to process bitwise XOR 16-bits in parallel using Double-64 integer adds :
The bit-strings look like these (i took out the
3e15
guard digit here for clarity) :one 52-bit unsigned integer add, and barely a handful of string substitution calls, and the output is already in a state that can be passed downstream.
The absolute highest value this add will climb to is 8222,2222,222,222, just shy of the 53-bit hard-limit.
For a bit-wise AND, convert all the 1's, leading 6 or 7, down to 0s : only 2's and the leading 8 are true bits that should then be converted to 1s.
For bit-wise OR, it's the reverse - anything not a 0 or 6 is a "1" in the output string.
For bit-wise complement, even easier - start with 1,111,111,111,111,111, and substract the concatenated bit strings of 2 bytes to obtain it.