MIPS加载地址la并不总是使用寄存器$1?

发布于 2024-12-13 19:54:52 字数 4907 浏览 2 评论 0原文

请参阅编辑部分以获取我的解释。

这有点长并且难以说明。但我很感谢您花时间阅读本文。请耐心听我说。

假设我有这样的:

.data
    str1: .asciiz "A"
    str2: .asciiz "1"
    myInt:
          .word 42      # allocate an integer word: 42
    myChar:
          .word 'Q'     # allocate a char word

    .text    
    .align 2
    .globl main

main:
    lw      $t0, myInt          # load myInt into register $t0

    lw      $t3, str1           # load str1 into register $t3

    lw      $t4, str2           #load str2 into register $t4

    la      $a0, str1           # load address str1

    la      $a1, str2           # load address str2

那么在 SPIM 中,用户文本段是

User Text Segment [00400000]..[00440000]
[00400000] 8fa40000  lw $4, 0($29)            ; 183: lw $a0 0($sp) # argc 
[00400004] 27a50004  addiu $5, $29, 4         ; 184: addiu $a1 $sp 4 # argv 
[00400008] 24a60004  addiu $6, $5, 4          ; 185: addiu $a2 $a1 4 # envp 
[0040000c] 00041080  sll $2, $4, 2            ; 186: sll $v0 $a0 2 
[00400010] 00c23021  addu $6, $6, $2          ; 187: addu $a2 $a2 $v0 
[00400014] 0c100009  jal 0x00400024 [main]    ; 188: jal main 
[00400018] 00000000  nop                      ; 189: nop 
[0040001c] 3402000a  ori $2, $0, 10           ; 191: li $v0 10 
[00400020] 0000000c  syscall                  ; 192: syscall # syscall 10 (exit) 
[00400024] 3c011001  lui $1, 4097             ; 23: lw $t0, myInt # load myInt into register $t0 
[00400028] 8c280004  lw $8, 4($1)             
[0040002c] 3c011001  lui $1, 4097             ; 25: lw $t3, str1 # load str1 into register $t3 
[00400030] 8c2b0000  lw $11, 0($1)            
[00400034] 3c011001  lui $1, 4097             ; 27: lw $t4, str2 #load str2 into register $t4 
[00400038] 8c2c0002  lw $12, 2($1)            
[0040003c] 3c041001  lui $4, 4097 [str1]      ; 29: la $a0, str1 # load address str1 
[00400040] 3c011001  lui $1, 4097 [str2]      ; 31: la $a1, str2 # load address str2 
[00400044] 34250002  ori $5, $1, 2 [str2]   

我理解 lw 是伪代码,因此需要将其分解为两条指令。我理解这部分。我们以数据段的入口地址作为“基指针”,相对访问其他数据(包括第一个数据)。

我还观察到 str1str2 的加载地址使用了两个不同的寄存器:$4$1。 $4 是 $a0。 这是为什么?

如果我交换最后两条指令,在 SPIM 上我会看到

...        
[0040003c] 3c011001  lui $1, 4097 [str2]      ; 31: la $a1, str2 # load address str2 
[00400040] 34250002  ori $5, $1, 2 [str2]     
[00400044] 3c041001  lui $4, 4097 [str1]      ; 32: la $a0, str1 # load address str1

那么为什么加载地址如此奇怪?为什么str2使用$1??? 我该如何解释 lui $1, 4097 [str2] 和 lui $4, 4097 [str1] 有何不同?

PS:有人也可以向我解释为什么我们需要括号 [str2] 吗?

lui, $1, 4097, [str2]仅将数据段的入口地址压入寄存器$1。即 0x10010000 。

非常感谢!


编辑

我重写了整个脚本以简化情况。

脚本:http://pastebin.com/BHh89iqt 文本段: http://pastebin.com/t2eDEs1f

让我提醒您,我们是用伪指令编写的,而不是真正的 MIPS 机器代码。即"lw"、"jal"、"addi"等都是伪指令。

例如,lw(加载字)被分解为两条机器指令(看文本段):

lui $1, 4097             ; 23: lw $t0, myInt # load myInt into register $t0 
lw $8, 4($1)

MIPS是32位的,因此我们将其分解为两条指令。寻址 32 位地址的总和将产生 43 位指令集。这就是我们分为两部分的原因。 标签是指向我们分配的事物的内存地址。

MIPS 只能读取 lw $rt, offset($rs) 形式的指令。所以大部分加载指令都遵循这种方式,使用$at将涉及标签的伪指令转换为MIPS机器指令。

对于lw来说这很容易。对于加载地址来说有点棘手。 注意最后四个加载地址指令。 MIPS 将它们翻译成这样:

[0040003c] 3c041001  lui $4, 4097 [str1]      ; 27: la $a0, str1 # load address str2 
[00400040] 3c011001  lui $1, 4097 [str2]      ; 28: la $a0, str2 # load address str1 
[00400044] 34240002  ori $4, $1, 2 [str2]     
[00400048] 3c011001  lui $1, 4097 [str2]      ; 30: la $a0, str2 # load address str2 
[0040004c] 34240002  ori $4, $1, 2 [str2]     
[00400050] 3c041001  lui $4, 4097 [str1]      ; 31: la $a0, str1 # load address str1 

$4 指的是 $a0。如果你看一下指令,我交换了前两条加载指令,结果是最后两条指令。 我故意这样做是为了说明奇怪的行为:在交换之前,lui使用$4来存储str1的地址,但是如果我想加载str2的地址,我将使用$at,然后应用偏移量。

昨晚我想不通为什么,现在我才意识到这样做是因为编译器足够聪明,知道str1是数据段中的第一个数据,所以不需要转换任何东西。

但这也很奇怪,因为编译器如何知道在哪个字节停止打印字符串? (如果我们想打印一个字符串...)

我的猜测:空字符来终止打印。

无论如何。我想这只是 MIPS 使用的约定。


第二次编辑

事实上,如果您只是在 str1 之上添加新数据,您会看到 我的解释是正确的。

新脚本:http://pastebin.com/8DuzFrk0

新文本段:http://pastebin.com/YXbvzc4z

我只将 myCharB 添加到数据段的顶部。

[0040003c] 3c011001  lui $1, 4097 [str1]      ; 29: la $a0, str1 #
load address str2
[00400040] 34240004  ori $4, $1, 4 [str1]
[00400044] 3c011001  lui $1, 4097 [str2]      ; 30: la $a0, str2 #
load address str1
[00400048] 34240006  ori $4, $1, 6 [str2] 

Please refers to the edit portion for my explanation.

This is a bit long and difficult to illustrate. But I appreciate taking your time to read this. Please bear with me.

Suppose I have this:

.data
    str1: .asciiz "A"
    str2: .asciiz "1"
    myInt:
          .word 42      # allocate an integer word: 42
    myChar:
          .word 'Q'     # allocate a char word

    .text    
    .align 2
    .globl main

main:
    lw      $t0, myInt          # load myInt into register $t0

    lw      $t3, str1           # load str1 into register $t3

    lw      $t4, str2           #load str2 into register $t4

    la      $a0, str1           # load address str1

    la      $a1, str2           # load address str2

Then in SPIM, the user text segment is

User Text Segment [00400000]..[00440000]
[00400000] 8fa40000  lw $4, 0($29)            ; 183: lw $a0 0($sp) # argc 
[00400004] 27a50004  addiu $5, $29, 4         ; 184: addiu $a1 $sp 4 # argv 
[00400008] 24a60004  addiu $6, $5, 4          ; 185: addiu $a2 $a1 4 # envp 
[0040000c] 00041080  sll $2, $4, 2            ; 186: sll $v0 $a0 2 
[00400010] 00c23021  addu $6, $6, $2          ; 187: addu $a2 $a2 $v0 
[00400014] 0c100009  jal 0x00400024 [main]    ; 188: jal main 
[00400018] 00000000  nop                      ; 189: nop 
[0040001c] 3402000a  ori $2, $0, 10           ; 191: li $v0 10 
[00400020] 0000000c  syscall                  ; 192: syscall # syscall 10 (exit) 
[00400024] 3c011001  lui $1, 4097             ; 23: lw $t0, myInt # load myInt into register $t0 
[00400028] 8c280004  lw $8, 4($1)             
[0040002c] 3c011001  lui $1, 4097             ; 25: lw $t3, str1 # load str1 into register $t3 
[00400030] 8c2b0000  lw $11, 0($1)            
[00400034] 3c011001  lui $1, 4097             ; 27: lw $t4, str2 #load str2 into register $t4 
[00400038] 8c2c0002  lw $12, 2($1)            
[0040003c] 3c041001  lui $4, 4097 [str1]      ; 29: la $a0, str1 # load address str1 
[00400040] 3c011001  lui $1, 4097 [str2]      ; 31: la $a1, str2 # load address str2 
[00400044] 34250002  ori $5, $1, 2 [str2]   

I understand that lw is a pseudocode so it needs to be broken down into two instructions. I understand this part. We use the entry address of data segment as a "base pointer" and relatively accessing other data (including the first data).

I also observed that loading address of str1 and str2 used two different registers: $4 and $1. $4 is $a0.
Why is that?

If I swap the last two instructions, on SPIM I see

...        
[0040003c] 3c011001  lui $1, 4097 [str2]      ; 31: la $a1, str2 # load address str2 
[00400040] 34250002  ori $5, $1, 2 [str2]     
[00400044] 3c041001  lui $4, 4097 [str1]      ; 32: la $a0, str1 # load address str1

So why is load address so strange? Why did str2 use $1 ???
How can I go about explaining how lui $1, 4097 [str2] and lui $4, 4097 [str1] are different?

PS: Can someone also explain to me why we need the bracket [str2] ?

lui, $1, 4097, [str2] only pushes the entry address of data segment to register $1. That is, 0x10010000 .

Thank you very much!


EDIT

I rewrote the entire script to simplify the situation.

Script: http://pastebin.com/BHh89iqt
Text Segment: http://pastebin.com/t2eDEs1f

Let me remind you that we write in pseudo instructions, rather than true MIPS machine code. That is, "lw", "jal", "addi", etc are all pseudo instructions.

For example, lw (load word) is broken down into two machine instructions (look at the text segement):

lui $1, 4097             ; 23: lw $t0, myInt # load myInt into register $t0 
lw $8, 4($1)

MIPS is 32-bit, we therefore break it down into two instructions. The total of addressing a 32-bit address will result in 43 bits instruction set.. this is why we break down into 2 parts.
A label is a memory address pointing at the thing we assigned to.

MIPS can only read instructions of the form lw $rt, offset($rs). So most of the load instructions follow this approach and use $at to convert pseudoinstructions that involve labels to MIPS machine instructions.

For lw it's pretty easy. For la load address it's a bit tricky.
Pay attention to the last four load address instructions. MIPS translates them into this:

[0040003c] 3c041001  lui $4, 4097 [str1]      ; 27: la $a0, str1 # load address str2 
[00400040] 3c011001  lui $1, 4097 [str2]      ; 28: la $a0, str2 # load address str1 
[00400044] 34240002  ori $4, $1, 2 [str2]     
[00400048] 3c011001  lui $1, 4097 [str2]      ; 30: la $a0, str2 # load address str2 
[0040004c] 34240002  ori $4, $1, 2 [str2]     
[00400050] 3c041001  lui $4, 4097 [str1]      ; 31: la $a0, str1 # load address str1 

$4 refers to $a0. If you look at the instructions, I swapped the first two load instructions and the result is the last two instructions.
I purposely did this to illustrate the strange behavior: before swapping, lui uses $4 to store the address of str1, but if I want to load the address of str2, I will use $at and then apply offset.

I couldn't figure out why last night, and just now, I realized that this is done because the compiler is smart enough to know that str1 is the first data in the data segement, so there is no need to convert anything.

Yet this is also strange because how does the compiler know at what byte to stop printing the string? (if we want to print a string...)

My guess: Null character to terminate print.

Anyhow. I guess this is just a convention that the MIPS uses.


Second Edit

In fact if you just add a new data on top of str1, you will see that
my explanation is correct.

New script: http://pastebin.com/8DuzFrk0

New Text Segment: http://pastebin.com/YXbvzc4z

I only added myCharB to the top of the data segment.

[0040003c] 3c011001  lui $1, 4097 [str1]      ; 29: la $a0, str1 #
load address str2
[00400040] 34240004  ori $4, $1, 4 [str1]
[00400044] 3c011001  lui $1, 4097 [str2]      ; 30: la $a0, str2 #
load address str1
[00400048] 34240006  ori $4, $1, 6 [str2] 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

兮颜 2024-12-20 19:54:52

我还观察到str1和str2的加载地址使用了两个
不同的寄存器:$4 和 $1。 $4 是 $a0。这是为什么?

好吧,谁在乎呢? xD 它是内部 SPIM 实现,只要不破坏 MIPS ABI,就可以自由使用任何寄存器。我只是建议您不要过多依赖伪指令来确定哪些寄存器已更改/它们保存的值是什么。通常 LW 不是伪指令,但按照您使用它的方式是。

有人可以向我解释一下为什么我们需要括号 [str2] 吗?

你不需要任何括号。这只是向程序员提供的 SPIM 信息,以显示该指令正在加载 str2 地址。它不是程序集的一部分。

lui, $1, 4097, [str2] 只将数据段的入口地址压入
注册 1 美元。即0x10010000

实际上它只加载 $1 的上半字。碰巧下半字是纯零。请记住,LUI 不会修改低半字,因此您必须确保它包含您想要的值(重置寄存器或使用 LI)。

但这也很奇怪,因为编译器如何知道什么
byte 停止打印字符串? (如果我们想打印一个字符串...)

空终止,正如你猜对的那样。

我想这只是 MIPS 使用的约定。

这比 MIPS 更古老。 MIPS 没有对此进行任何定义,任何其他架构也没有定义。这是数据处理,它是在操作系统等上层定义的。在本例中,它是其自己的系统调用的 SPIM 约定。无论如何,以空结尾的字符串非常常见。 C 编程语言使用 so 来表示字符串。

I also observed that loading address of str1 and str2 used two
different registers: $4 and $1. $4 is $a0. Why is that?

Well, who cares? xD It's internal SPIM implementation and it's free to use any register as long as it does not break MIPS ABI. I just suggest you not relying too much on pseudo-instructions to make sure what registers have changed/what values they hold. Also usually LW is not a pseudo-instruction, but in the way you're using it is.

Can someone also explain to me why we need the bracket [str2] ?

You don't need any brackets. That's just a SPIM information to the programmer to show this instruction is loading the str2 address. It's not part of the assembly.

lui, $1, 4097, [str2] only pushes the entry address of data segment to
register $1. That is, 0x10010000

Well actually it only load upper half-word of $1. It just happens that the lower half-word is plain zeroes. Keep in mind LUI does not modify lower half-word, so you have to make sure it holds the value you want (reseting register or using LI).

Yet this is also strange because how does the compiler know at what
byte to stop printing the string? (if we want to print a string...)

Null-terminated, as you guessed right.

I guess this is just a convention that the MIPS uses.

This is way older than MIPS. And MIPS doesn't define anything about this, either does any other architecture. This is data handling and it's defined on a upper layer like OS. In this case it is SPIM convention on its own syscalls. Anyway null-terminated strings are pretty common. C programming language uses so for strings.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文