MIPS 上最快的 8x16 位矩阵转置?

发布于 2024-12-12 09:17:37 字数 138 浏览 3 评论 0原文

我有一个 8x16 位矩阵作为 UINT8 矩阵[16]。

我想转置矩阵并将其存储为 UINT16 矩阵2[8]。

这是我的代码中时间关键的部分,因此我需要尽快完成此操作。有没有一种巧妙的方法可以在 MIPS 处理器上实现这一点?

I have an 8x16 matrix of bits as a UINT8 matrix[16].

I want to transpose the matrix and store it as a UINT16 matrix2[8].

This is in a time critical piece of my code, and so I need to do this as fast as possible. Is there a clever way to achieve this on a MIPS processor?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

萝莉病 2024-12-19 09:17:37

我不认为 MIPS 指令集中有任何特殊指令可以帮助解决这个问题,因此您也可以用 C 语言对其进行编码。如果您有权访问处理器 RTL,则可以创建用户定义的指令。 ..

I don't think that there are any special instructions in the MIPS instruction set that would help with this, so you might just as well code it in C. You could create a user defined instruction if you have access to the processor RTL....

诗酒趁年少 2024-12-19 09:17:37

也许是这样的:

  lbu $10, matrix
  lbu $11, matrix+1
  lbu $12, matrix+2
  lbu $13, matrix+3
  lbu $14, matrix+4
  lbu $15, matrix+5
  lbu $16, matrix+6
  lbu $17, matrix+7
  lbu $18, matrix+8
  lbu $19, matrix+9
  lbu $20, matrix+10
  lbu $21, matrix+11
  lbu $22, matrix+12
  lbu $23, matrix+13
  lbu $24, matrix+14
  lbu $25, matrix+15

  addiu $2, $0, 8
  addiu $9, $0, 256
loop:
  addiu $2, $2, -1
  srl $9, $9, 1
  addu $27, $0, $0

  and $26, $10, $9
  srlv $26, $26, $2
  or $27, $27, $26

  and $26, $11, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $12, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $13, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $14, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $15, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $16, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $17, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $18, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $19, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $20, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $21, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $22, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $23, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $24, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $25, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  sll $3, $2, 1
  sh $27, transposed($3)
  bgez  $2, loop
  nop  


.data 0x2000
matrix:  
.byte 0x80
.byte 0x80
.byte 0x40
.byte 0x40
.byte 0x20
.byte 0x20
.byte 0x10
.byte 0x10
.byte 0x08
.byte 0x08
.byte 0x04
.byte 0x04
.byte 0x02
.byte 0x02
.byte 0x01
.byte 0x01

.data 0x3000
transposed:
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0

它读取输入矩阵,然后执行循环 8 次(每个转置矩阵行一次)。

Maybe something like this:

  lbu $10, matrix
  lbu $11, matrix+1
  lbu $12, matrix+2
  lbu $13, matrix+3
  lbu $14, matrix+4
  lbu $15, matrix+5
  lbu $16, matrix+6
  lbu $17, matrix+7
  lbu $18, matrix+8
  lbu $19, matrix+9
  lbu $20, matrix+10
  lbu $21, matrix+11
  lbu $22, matrix+12
  lbu $23, matrix+13
  lbu $24, matrix+14
  lbu $25, matrix+15

  addiu $2, $0, 8
  addiu $9, $0, 256
loop:
  addiu $2, $2, -1
  srl $9, $9, 1
  addu $27, $0, $0

  and $26, $10, $9
  srlv $26, $26, $2
  or $27, $27, $26

  and $26, $11, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $12, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $13, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $14, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $15, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $16, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $17, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $18, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $19, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $20, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $21, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $22, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $23, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $24, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  and $26, $25, $9
  srlv $26, $26, $2
  sll $27, $27, 1
  or $27, $27, $26

  sll $3, $2, 1
  sh $27, transposed($3)
  bgez  $2, loop
  nop  


.data 0x2000
matrix:  
.byte 0x80
.byte 0x80
.byte 0x40
.byte 0x40
.byte 0x20
.byte 0x20
.byte 0x10
.byte 0x10
.byte 0x08
.byte 0x08
.byte 0x04
.byte 0x04
.byte 0x02
.byte 0x02
.byte 0x01
.byte 0x01

.data 0x3000
transposed:
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0
.half 0

It reads the input matrix and then perform a loop 8 times (once for each transposed matrix row).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文