这个 MIPS strlen 是否正确地从相应的 C 循环转换而来?
我有一个关于计算机科学课程的简单问题,我的任务是将函数转换为 MIPS 汇编语言。我相信我有一个正确的答案,但我想验证它。
这是C函数
int strlen(char *s) {
int len;
len=0;
while(*s != '\0') {
len++;
s++;
}
return len;
}
谢谢!
strlen:
add $v0, $zero, $zero # len = 0
loop: # do{
lbu $t0, 0($a0) # tmp0 = load *s
addi $a0, $a0, 1 # s++
addi $v0, $v0, 1 # len++
bne $t0, $zero, loop # }while(tmp0 != 0)
s_end:
addi $v0, $v0, -1 # undo counting of the terminating 0
j $ra
I have a simple question for a Comp Sci class I'm taking where my task is to convert a function into MIPS assembly language. I believe I have a correct answer but I want to verify it.
This is the C function
int strlen(char *s) {
int len;
len=0;
while(*s != '\0') {
len++;
s++;
}
return len;
}
Thanks!
strlen:
add $v0, $zero, $zero # len = 0
loop: # do{
lbu $t0, 0($a0) # tmp0 = load *s
addi $a0, $a0, 1 # s++
addi $v0, $v0, 1 # len++
bne $t0, $zero, loop # }while(tmp0 != 0)
s_end:
addi $v0, $v0, -1 # undo counting of the terminating 0
j $ra
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,你有一个正确的 asm 版本,我喜欢这样一个事实:在测试 t0 的值之前你做了尽可能多的工作,以便为从内存加载提供尽可能多的时间。
Yeah, you have a correct asm version, and I like the fact that you do as much work as possible before testing the value of t0 to give as much time as possible for loading from memory.
(编者注:循环后添加
-1
将 off 修正 1,同时仍允许高效的 do{}while 循环结构。此答案提出了从 C 到的更直译if() 在无条件循环内中断
。)我认为 while 循环在
*s == 0
的情况下是不正确的。它应该是这样的:
*您可以使用宏指令(beqz $t0,s_end)而不是beq指令。
(Editor's note: the add of
-1
after the loop corrects for off by 1 while still allowing an efficient do{}while loop structure. This answer proposes a more literal translation from C into anif() break
inside an unconditional loop.)I think the while loop isn't right in the case of
*s == 0
.It should be something like this:
*You could use a macro instruction (beqz $t0, s_end) instead of beq instruction.
是的,在我看来是正确的,而且相当有效。使用类似于
do{}while()
结构的 asm 实现while
循环是在 asm 中循环的标准且最佳方法。 为什么循环总是编译成“do.. .while”风格(尾部跳转)?C 的更直接音译会在递增
len
之前检查*s
。例如,通过剥离第一次迭代并将其转换为可以跳过空字符串的整个循环的加载/分支。 (并重新排序循环体,这可能会使负载靠近分支,由于负载延迟
而导致性能更差。)您可以在循环:以
len=-1
开头,而不是0
。 使用li $v0, -1
仍然可以通过以下方式实现一条指令:addiu $v0, $zero, -1
进一步的优化是仅在循环内进行指针增量,并使用
len = end - start
求出末尾的长度代码>.我们可以通过在将传入指针复制到另一个寄存器时偏移传入指针来纠正偏移一(不计算终止符)。
我使用
addiu
/subu
因为我不希望它在指针有符号溢出时出错。您的版本可能也应该使用addiu
,因此它适用于高达 4GB 的字符串,而不仅仅是 2。未经测试,但我们可以考虑正确性:
s
code> 指向 0):当我们到达最后的减法时,我们有v0=s+1
(来自循环之前)和a0=s+1
(来自循环之前)由于加载$t0 = 0
而失败的第一个/唯一迭代)。减去这些得到len=0
=strlen("")
len = (s+2) - (s+1) = 1
。对于具有分支延迟槽的 MIPS,可以将 addiu 和 subu 分别重新排序在 bne 和 jr 之后,填充这些分支延迟槽。 (但是 bne 就在加载之后,因此传统的 MIPS 必须停止,甚至在没有加载互锁的 MIPS I 上用 nop 填充加载延迟槽)。
当然,如果您确实关心中小型字符串(不仅仅是很小)(例如超过 8 或 16 字节)的实际
strlen
性能,请使用一次检查整个单词的 bithack,以便可能有一个0
字节。为什么glibc的strlen需要这么复杂才能快速运行吗?
Yes, looks correct to me, and fairly efficient. Implementing a
while
loop with asm structured like ado{}while()
is the standard and best way to loop in asm. Why are loops always compiled into "do...while" style (tail jump)?A more direct transliteration of the C would check
*s
before incrementinglen
.e.g. by peeling the first iteration and turning it into a load/branch that can skip the whole loop for an empty string. (And reordering the loop body, which would probably put the load close to the branch, worse for performance because of load latency.)
You could optimize away the
len--
overshoot-correction after the loop: start withlen=-1
instead of0
. Useli $v0, -1
which can still be implemented with a single instruction:addiu $v0, $zero, -1
A further step of optimization is to only do the pointer increment inside the loop, and find the length at the end with
len = end - start
.We can correct for the off-by-one (to not count the terminator) by offsetting the incoming pointer while we're copying it to another reg.
I used
addiu
/subu
because I don't want it to fault on signed-overflow of a pointer. Your version should probably useaddiu
as well so it works for strings up to 4GB, not just 2.Untested, but we can think through the correctness:
s
points at a 0): when we reach the final subtract, we havev0=s+1
(from before the loop) anda0=s+1
(from the first/only iteration which falls through because it loads$t0 = 0
). Subtracting these giveslen=0
=strlen("")
len = (s+2) - (s+1) = 1
.For MIPS with a branch-delay slot, the addiu and subu can be reordered after bne and jr respectively, filling those branch-delay slots. (But then
bne
is right after the load so classic MIPS would have to stall, or even fill the load-delay slot with a nop on a MIPS I without interlocks for loads).Of course if you actually care about real-world
strlen
performance for small to medium strings (not just tiny), like more than 8 or 16 bytes, use a bithack that checks whole words at once for maybe having a0
byte.Why does glibc's strlen need to be so complicated to run quickly?