我正在尝试找到一种快速旋转的方法,并反射一个5x5板以将其存储在转置表中。由于非常快,板被表示为位。
比特板是这样的:
20 21 22 23 24
15 16 17 18 19
10 11 12 13 14
05 06 07 08 09
00 01 02 03 04
我已经找到了一些针对8x8位板的解决方案但是我找不到适用于5x5比特板的解决方案,我也尝试过循环遍历所有位并切换它们,但这是一个非常缓慢的解决方案。 (C ++)
I am trying to find a fast way to rotate and reflect a 5x5 board to store it in a transposition table. The board is represented as a bitboard as they are very fast.
The bitboard is represented like this:
20 21 22 23 24
15 16 17 18 19
10 11 12 13 14
05 06 07 08 09
00 01 02 03 04
I have have found some solutions for 8x8 bitboards https://www.chessprogramming.org/Flipping_Mirroring_and_Rotating but I can't find a solution that works for a 5x5 bitboard, I also have tried looping through all of the bits and switching them but that was a very slow solution. (C++)
发布评论
评论(1)
可以使用4个三角洲掉期进行转置,它不能像8x8转座那样很好地分解,因为5不是两个的力量。 4个三角洲掉期将花费大约24次操作和大约20个周期(由于指令级并行性,少于24个周期,但不小于24个周期,因为由于依赖关系,该代码主要是串行的)。
看起来就是这样:
where
bit_permute_step_step_step_step
是熟悉的熟悉三角洲交换。
另一种方法是使用乘法提取列。例如
x& 0b000010000100100100100100001
选择一个列,然后我们可以选择一个常数以乘以乘以列的位,以便在32位单词的前5位中以连续的序列中的列列。其余的单词将包含这些位的其他流浪副本,但它们将被转移出去。例如(在C#中显示,应易于移植)在23个操作下,这似乎并没有保存任何东西,但是由于教学级别并行的机会增加,它的延迟可能会较低。
pext同时两列,但是我们必须首先与板上串联,因为pext无法重新排序:(也未进行测试)
水平或垂直翻转很容易,而位小组移动则很容易。
Transpose could be done with 4 delta swaps, it does not decompose as nicely as an 8x8 transpose because 5 is not a power of two. 4 delta swaps would cost about 24 operations and about 20 cycles (less than 24 thanks to instruction level parallelism, but not a lot less than 24 because the code would be mostly serial due to dependencies).
That would look like this:
Where
bit_permute_step
is the familiar delta swap.An alternative way is to use multiplication to extract columns. For example
x & 0b0000100001000010000100001
selects one column, then we can choose a constant to multiply by such that the bits from the column line up in a contiguous sequence in the top 5 bits of a 32-bit word. The rest of the word would contain other stray copies of those bits, but they will be shifted out. For example (shown in C#, should be easy to port)At 23 operations, this doesn't seem to have saved anything, but it could have a lower latency thanks to the increased opportunity for instruction level parallelism.
PEXT makes that nicer/simpler: (not tested)
64-bit pext would enable extracting two columns at once, but we have to concatenate the board with itself first because pext cannot reorder: (also not tested)
Flipping horizontally or vertically would be easy with bit-group moving.