这个 MIPS strlen 是否正确地从相应的 C 循环转换而来?

发布于 2024-11-03 08:49:26 字数 634 浏览 3 评论 0原文

我有一个关于计算机科学课程的简单问题,我的任务是将函数转换为 MIPS 汇编语言。我相信我有一个正确的答案,但我想验证它。

这是C函数

int strlen(char *s) {
     int len;

     len=0;
     while(*s != '\0') {
          len++;
          s++;
     }
     return len;
}

谢谢!

strlen: 
    add $v0, $zero, $zero     # len = 0
loop:                          # do{
    lbu $t0, 0($a0)               # tmp0 = load *s
    addi $a0, $a0, 1              # s++
    addi $v0, $v0, 1              # len++
    bne $t0, $zero, loop         # }while(tmp0 != 0)
s_end:
    addi $v0, $v0, -1        # undo counting of the terminating 0
    j $ra 

I have a simple question for a Comp Sci class I'm taking where my task is to convert a function into MIPS assembly language. I believe I have a correct answer but I want to verify it.

This is the C function

int strlen(char *s) {
     int len;

     len=0;
     while(*s != '\0') {
          len++;
          s++;
     }
     return len;
}

Thanks!

strlen: 
    add $v0, $zero, $zero     # len = 0
loop:                          # do{
    lbu $t0, 0($a0)               # tmp0 = load *s
    addi $a0, $a0, 1              # s++
    addi $v0, $v0, 1              # len++
    bne $t0, $zero, loop         # }while(tmp0 != 0)
s_end:
    addi $v0, $v0, -1        # undo counting of the terminating 0
    j $ra 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

旧街凉风 2024-11-10 08:49:26

是的,你有一个正确的 asm 版本,我喜欢这样一个事实:在测试 t0 的值之前你做了尽可能多的工作,以便为从内存加载提供尽可能多的时间。

Yeah, you have a correct asm version, and I like the fact that you do as much work as possible before testing the value of t0 to give as much time as possible for loading from memory.

剩余の解释 2024-11-10 08:49:26

(编者注:循环后添加 -1 将 off 修正 1,同时仍允许高效的 do{}while 循环结构。此答案提出了从 C 到 的更直译if() 在无条件循环内中断。)

我认为 while 循环在 *s == 0 的情况下是不正确的。

它应该是这样的:

    ...
    lbu    $t0, 0($a0)

loop:
    beq    $t0, $zero, s_end    # *
    ...
    b   loop

s_end:
    ...

*您可以使用宏指令(beqz $t0,s_end)而不是beq指令。

(Editor's note: the add of -1 after the loop corrects for off by 1 while still allowing an efficient do{}while loop structure. This answer proposes a more literal translation from C into an if() break inside an unconditional loop.)

I think the while loop isn't right in the case of *s == 0.

It should be something like this:

    ...
    lbu    $t0, 0($a0)

loop:
    beq    $t0, $zero, s_end    # *
    ...
    b   loop

s_end:
    ...

*You could use a macro instruction (beqz $t0, s_end) instead of beq instruction.

骄傲 2024-11-10 08:49:26

是的,在我看来是正确的,而且相当有效。使用类似于 do{}while() 结构的 asm 实现 while 循环是在 asm 中循环的标准且最佳方法。 为什么循环总是编译成“do.. .while”风格(尾部跳转)?

C 的更直接音译会在递增 len 之前检查 *s
例如,通过剥离第一次迭代并将其转换为可以跳过空字符串的整个循环的加载/分支。 (并重新排序循环体,这可能会使负载靠近分支,由于负载延迟


而导致性能更差。)您可以在循环:以 len=-1 开头,而不是 0 使用 li $v0, -1 仍然可以通过以下方式实现一条指令:
addiu $v0, $zero, -1


进一步的优化是仅在循环内进行指针增量,并使用 len = end - start 求出末尾的长度代码>.

我们可以通过在将传入指针复制到另一个寄存器时偏移传入指针来纠正偏移一(不计算终止符)。

# char *s input in $a0,  size_t length returned in $v0
strlen:
    addiu  $v0, $a0, 1               # char *start_1 = start + 1
loop:                                # do{
    lbu    $t0, ($a0)                  # char tmp0 = load *s
    addiu  $a0, $a0, 1                 # s++
    bne    $t0, $zero, loop          # }while(tmp0 != '\0')
s_end:
    subu   $v0, $a0, $v0             # size_t len = s - start
    jr    $ra

我使用 addiu / subu 因为我不希望它在指针有符号溢出时出错。您的版本可能也应该使用 addiu ,因此它适用于高达 4GB 的字符串,而不仅仅是 2。

未经测试,但我们可以考虑正确性:

  • 对于空字符串输入(s code> 指向 0):当我们到达最后的减法时,我们有 v0=s+1 (来自循环之前)和 a0=s+1 (来自循环之前)由于加载 $t0 = 0 而失败的第一个/唯一迭代)。减去这些得到 len=0 = strlen("")
  • 对于 length=1 的字符串: v0=s+1,但循环体运行两次,所以我们有 a0= s+2。 len = (s+2) - (s+1) = 1
  • 通过归纳,更大的长度也能起作用。

对于具有分支延迟槽的 MIPS,可以将 addiu 和 subu 分别重新排序在 bne 和 jr 之后,填充这些分支延迟槽。 (但是 bne 就在加载之后,因此传统的 MIPS 必须停止,甚至在没有加载互锁的 MIPS I 上用 nop 填充加载延迟槽)。


当然,如果您确实关心中小型字符串(不仅仅是很小)(例如超过 8 或 16 字节)的实际 strlen 性能,请使用一次检查整个单词的 bithack,以便可能有一个 0 字节。
为什么glibc的strlen需要这么复杂才能快速运行吗?

Yes, looks correct to me, and fairly efficient. Implementing a while loop with asm structured like a do{}while() is the standard and best way to loop in asm. Why are loops always compiled into "do...while" style (tail jump)?

A more direct transliteration of the C would check *s before incrementing len.
e.g. by peeling the first iteration and turning it into a load/branch that can skip the whole loop for an empty string. (And reordering the loop body, which would probably put the load close to the branch, worse for performance because of load latency.)


You could optimize away the len-- overshoot-correction after the loop: start with len=-1 instead of 0. Use li $v0, -1 which can still be implemented with a single instruction:
addiu $v0, $zero, -1


A further step of optimization is to only do the pointer increment inside the loop, and find the length at the end with len = end - start.

We can correct for the off-by-one (to not count the terminator) by offsetting the incoming pointer while we're copying it to another reg.

# char *s input in $a0,  size_t length returned in $v0
strlen:
    addiu  $v0, $a0, 1               # char *start_1 = start + 1
loop:                                # do{
    lbu    $t0, ($a0)                  # char tmp0 = load *s
    addiu  $a0, $a0, 1                 # s++
    bne    $t0, $zero, loop          # }while(tmp0 != '\0')
s_end:
    subu   $v0, $a0, $v0             # size_t len = s - start
    jr    $ra

I used addiu / subu because I don't want it to fault on signed-overflow of a pointer. Your version should probably use addiu as well so it works for strings up to 4GB, not just 2.

Untested, but we can think through the correctness:

  • For an empty string input (s points at a 0): when we reach the final subtract, we have v0=s+1 (from before the loop) and a0=s+1 (from the first/only iteration which falls through because it loads $t0 = 0). Subtracting these gives len=0 = strlen("")
  • For a length=1 string: v0=s+1, but the loop body runs twice so we have a0=s+2. len = (s+2) - (s+1) = 1.
  • By induction, larger lengths work too.

For MIPS with a branch-delay slot, the addiu and subu can be reordered after bne and jr respectively, filling those branch-delay slots. (But then bne is right after the load so classic MIPS would have to stall, or even fill the load-delay slot with a nop on a MIPS I without interlocks for loads).


Of course if you actually care about real-world strlen performance for small to medium strings (not just tiny), like more than 8 or 16 bytes, use a bithack that checks whole words at once for maybe having a 0 byte.
Why does glibc's strlen need to be so complicated to run quickly?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文