GCC ARM 汇编预处理器宏

发布于 2024-11-09 16:19:37 字数 1777 浏览 7 评论 0原文

我正在尝试使用汇编（ARM）宏进行定点乘法：

    #define MULT(a,b) __asm__ __volatile__ ( \
        "SMULL r2, r3, %0, %1\n\t" \
        "ADD r2, r2, #0x8000\n\t" \
        "ADC r3, r3, #0\n\t" \
        "MOV %0, r2, ASR#16\n\t" \
        "ORR %0, %0, r3, ASL#16" \
        : "=r" (a) : "0"(a), "1"(b) : "r2", "r3" );

但是在尝试编译时出现错误：'asm'之前的预期表达式

（如果你珍惜你的时间，但如果你看一下它会很好，这里的主要问题是如何使上述工作）

我尝试了这个：

    static inline GLfixed MULT(GLfixed a, GLfixed b){
       asm volatile(
        "SMULL r2, r3, %[a], %[b]\n"
        "ADD r2, r2, #0x8000\n"
        "ADC r3, r3, #0\n"
        "MOV %[a], r2, ASR#16\n"
        "ORR %[a], %[a], r3, ASL#16\n"
        : "=r" (a)
        : [a] "r" (a), [b] "r" (b)
        : "r2", "r3");
     return a; }

这个可以编译，但似乎有一个问题，因为当我使用常量时，例如：MULT (65536,65536)它可以工作，但是当我使用变量时，它似乎搞砸了：

GLfixed m[16];
m[0]=costab[player_ry];//1(65536 integer representation)
m[5]=costab[player_rx];//1(65536 integer representation)
m[6]=-sintab[player_rx];//0
m[8]=-sintab[player_ry];//0
LOG("%i,%i,%i",m[6],m[8],MULT(m[6],m[8]));
m[1]=MULT(m[6],m[8]);
m[2]=MULT(m[5],-m[8]);
m[9]=MULT(-m[6],m[0]);
m[10]=MULT(m[5],m[0]);
m[12]=MULT(m[0],0)+MULT(m[8],0);
m[13]=MULT(m[1],0)+MULT(m[5],0)+MULT(m[9],0);
m[14]=MULT(m[2],0)+MULT(m[6],0)+MULT(m[10],0);
m[15]=0x00010000;//1(65536 integer representation)

int i=0;
while(i<16)
{
    LOG("%i,%i,%i,%i",m[i],m[i+1],m[i+2],m[i+3]);
    i+=4;
}

上面的代码将打印(LOG就像这里的printf)：

0,0,-1411346156
65536,65536,65536,440
-2134820096,65536,0,-1345274311
0,65536,22,220
65536,196608,131072,65536

当正确的结果是(显然上面有很多垃圾)时：

0,0,0
65536,0,0,0
0,65536,0,0
0,0,65536,0
0,0,0,65536

原文

I am trying to use an assembly(ARM) macro for fixed-point multiplication:

    #define MULT(a,b) __asm__ __volatile__ ( \
        "SMULL r2, r3, %0, %1\n\t" \
        "ADD r2, r2, #0x8000\n\t" \
        "ADC r3, r3, #0\n\t" \
        "MOV %0, r2, ASR#16\n\t" \
        "ORR %0, %0, r3, ASL#16" \
        : "=r" (a) : "0"(a), "1"(b) : "r2", "r3" );

but when trying to compile I get error(s): expected expression before 'asm'

(You can ignore everything below this if you value your time but it would be nice if you took a look at it, the main question here is how to make the above work)

I tried this:

    static inline GLfixed MULT(GLfixed a, GLfixed b){
       asm volatile(
        "SMULL r2, r3, %[a], %[b]\n"
        "ADD r2, r2, #0x8000\n"
        "ADC r3, r3, #0\n"
        "MOV %[a], r2, ASR#16\n"
        "ORR %[a], %[a], r3, ASL#16\n"
        : "=r" (a)
        : [a] "r" (a), [b] "r" (b)
        : "r2", "r3");
     return a; }

This compiles but there seems to be a problem because when I use constants ex: MULT(65536,65536) it works but when I use variables it seems to f**k up:

GLfixed m[16];
m[0]=costab[player_ry];//1(65536 integer representation)
m[5]=costab[player_rx];//1(65536 integer representation)
m[6]=-sintab[player_rx];//0
m[8]=-sintab[player_ry];//0
LOG("%i,%i,%i",m[6],m[8],MULT(m[6],m[8]));
m[1]=MULT(m[6],m[8]);
m[2]=MULT(m[5],-m[8]);
m[9]=MULT(-m[6],m[0]);
m[10]=MULT(m[5],m[0]);
m[12]=MULT(m[0],0)+MULT(m[8],0);
m[13]=MULT(m[1],0)+MULT(m[5],0)+MULT(m[9],0);
m[14]=MULT(m[2],0)+MULT(m[6],0)+MULT(m[10],0);
m[15]=0x00010000;//1(65536 integer representation)

int i=0;
while(i<16)
{
    LOG("%i,%i,%i,%i",m[i],m[i+1],m[i+2],m[i+3]);
    i+=4;
}

The above code will print(LOG is like printf here):

0,0,-1411346156
65536,65536,65536,440
-2134820096,65536,0,-1345274311
0,65536,22,220
65536,196608,131072,65536

When the correct result would be(obviously alot of junk in the above):

0,0,0
65536,0,0,0
0,65536,0,0
0,0,65536,0
0,0,0,65536

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最初的梦 2024-11-16 16:19:37

第一部分很简单：问题是 __asm__ 块是一个语句，而不是一个表达式。

您可以使用 GCC 的语句表达式扩展来实现您想要的东西 - 一些东西像这样：

#define MULT(a,b) \
  ({ \
    __asm__ __volatile__ ( \
      /* ... asm stuff here ... */
    ); \
    a; \
  })

第二部分是由于输入和输出操作数规范的问题。这里有两个不同的版本，而且都是错误的。在宏版本中，您说过：

: "=r" (a) : "0"(a), "1"(b) : "r2", "r3"

它将

输出a限制到寄存器（这是操作数0）；
输入a与操作数0相同，即相同的寄存器（这是操作数1）；
输入b与操作数1相同，即再次相同（这是操作数2）。

此处需要 "r"(b)，并且可以将其称为 %2。

在内联版本中，您说过：

: "=r" (a) : [a] "r" (a), [b] "r" (b) : "r2", "r3"

它将输出 a 和输入 a 和 b 限制为寄存器，但

它没有声明他们之间的任何关系；
asm 从未显式引用输出操作数（您没有为输出操作数指定名称，并且 asm 代码不引用 %0）。

您应该能够使用以下方法修复原始版本：

: "=r" (a) : "0" (a), "r" (b) : "r2", "r3"

并将 a 引用为 %0 或 %1，以及 b 为 %2。

内联版本可以这样修复：

: [a] "=r" (a) : "[a]" (a), [b] "r" (b) : "r2", "r3"

并将操作数引用为 %[a] 和 %[b]。

如果您想在宏版本中使用名称，则需要类似的内容

: [arg_a] "=r" (a) : "[arg_a]" (a), [arg_b] "r" (b) : "r2", "r3"

（并参考 %[arg_a] 和 %[arg_b]），否则预处理器将扩展 [a] 和 [b] 内的 a 和 b。

请注意命名参数情况中的微妙之处：当为参数指定名称时（如输出 a 中所示），您会编写 [a] - 不带引号 - 但当您引用的是另一个已命名操作数的名称（如输入 a 中），您需要将其放在引号内："[a]"。

The first part is easy enough: the problem is that an __asm__ block is a statement, not an expression.

You can use GCC's statement expressions extension to achieve what you want - something like this:

#define MULT(a,b) \
  ({ \
    __asm__ __volatile__ ( \
      /* ... asm stuff here ... */
    ); \
    a; \
  })

The second part is due to problems in the input and output operand specifications. You have two different versions here, and both are wrong. In the macro version, you've said:

: "=r" (a) : "0"(a), "1"(b) : "r2", "r3"

which constrains

the output a to a register (this is operand 0);
the input a to be the same as operand 0, i.e. the same register (this is operand 1);
the input b to be the same as operand 1, i.e. the same again (this is operand 2).

You need "r"(b) here, and can refer to it as %2.

In the inline version, you've said:

: "=r" (a) : [a] "r" (a), [b] "r" (b) : "r2", "r3"

which constrains the output a and the input a and b to registers, but

it does not declare any relationship between them;
the asm never explicitly refers to the output operand (you haven't given the output operand a name, and the asm code doesn't refer to %0).

You should be able to fix the original version with:

: "=r" (a) : "0" (a), "r" (b) : "r2", "r3"

and refer to the a as either %0 or %1, and b as %2.

The inline version can be fixed like this:

: [a] "=r" (a) : "[a]" (a), [b] "r" (b) : "r2", "r3"

and refer the operands as %[a] and %[b].

If you want to use names in the macro version, you'll need something along the lines of

: [arg_a] "=r" (a) : "[arg_a]" (a), [arg_b] "r" (b) : "r2", "r3"

(and refer to %[arg_a] and %[arg_b]) because otherwise the preprocessor will expand the a and b inside [a] and [b].

Note the subtlety in the named argument cases: when a name is being given to an argument (as in the output a) you write [a] - no quotes - but when you are referring to the name of another already-named operand (as in the input a) you need to put it inside quotes: "[a]".

回复收藏 0 原文

狼亦尘 2024-11-16 16:19:37

您是否尝试过简单的 C 代码而不是汇编？在我的 GCC 4.5.3 系统上，编译器生成的代码至少与您手写的汇编程序一样好：

int mul (int a, int b)
{
  long long x = ((long long)a * b + 0x8000);
  return x>>16;
}

编译为以下 asm 代码：

# input: r0, r1
mov    r3, #32768
mov    r4, #0
smlal  r3, r4, r0, r1
mov    r0, r3, lsr #16
orr    r0, r0, r4, asl #16
# result in r0

（删除了函数调用 Epilog 和 Prolog）

如果您有多个乘法，代码会变得更好在单个函数中，因为编译器将删除冗余的 mov r3, #32768 指令。

Have you tried simple C-code instead of assembly? On my system with GCC 4.5.3 the compiler generates code that is at least as good as your hand written assembler:

int mul (int a, int b)
{
  long long x = ((long long)a * b + 0x8000);
  return x>>16;
}

compiles to the following asm-code:

# input: r0, r1
mov    r3, #32768
mov    r4, #0
smlal  r3, r4, r0, r1
mov    r0, r3, lsr #16
orr    r0, r0, r4, asl #16
# result in r0

(Function call epilog and prolog removed)

The code becomes even better if you have multiple multiplications in a single function because the compiler will remove the redundant mov r3, #32768 instructions.

回复收藏 0 原文

~没有更多了~