汇编语言编程提示和技巧
我正在尝试编写自己的“玩具”操作系统,目前我主要在汇编(NASM)中进行 - 部分原因是我希望它能帮助我理解 x86 反汇编,也因为我发现也还蛮好玩的!
这是我第一次使用汇编语言进行编程 - 我的学习速度比我预期的要快,但是,与学习任何明显不同的语言一样,当我试图找出我需要的模式和约定时,我发现我的代码结构相当混乱。应该使用。
目前,我尤其在努力解决以下问题:
跟踪寄存器
目前,一切都处于 16 位模式,因此我只有 6 个通用寄存器可供使用,可用的寄存器甚至更少用于访问内存。我继续践踏自己的寄存器,这反过来意味着我经常交换寄存器以避免这种情况 - 因此,即使有自由评论,我也很难跟踪哪些寄存器包含哪些值。这是正常的吗?我可以做些什么来帮助让事情更容易跟踪吗?
例如,我开始用被破坏的寄存器列表来注释我的所有函数:
; ================
; c_lba_chs
; Converts logical block addressing to Cylinder / Head / Selector
; ax (input, clobbered) - LBA
; ch (output) - Track number (cylinder)
; cl (output) - Sector number
; dh (output) - Head number
; ================
跟踪堆栈
在一些情况下,当我用完时,我开始使用堆栈寄存器,但这让事情变得更糟 - 任何比简单的 push call pop
序列来保存寄存器更复杂的事情都会导致我完全失去跟踪,甚至很难判断我是否已经得到了堆栈上的项目数量正确(特别是当涉及错误处理时 - 见下文),更不用说它们的顺序了。我知道一定有更好的方法来使用堆栈,我只是看不到它是什么。
处理错误
我一直在使用进位标志和零标志(取决于函数)来向调用者指示错误,例如:
myfn:
; Do things
jz .error
; Do more things
ret
.error:
stc
ret
这是指示错误的正常方式吗?
还有其他提示或技巧可以用来更好地构建我的程序集吗?
最后有没有好的资源/编写良好的汇编示例?我遇到过汇编语言编程的艺术,但是看起来非常关注语言的本质,而不那么强调代码应该如何构建。 (另外一些代码示例使用段,我认为我应该避免)。
我使用零段(平面内存模型)来完成所有这些工作,以使事情变得简单,并且在我开始使用 C 时使事情变得更容易。
I'm having a go at writing my own "toy" OS and for the moment I'm doing it mostly in assembly (NASM) - partly because I'm hoping it will help me understand x86 disassembly and also because I'm finding it fairly fun too!
This is my first experience programming in assembly - I'm picking things up quicker than I expected, however as with learning any significantly different language I'm finding that my code is structured fairly chaotically as I try to figure out what patterns and conventions I should be using.
At the moment in particular I'm struggling with:
Keeping track of registers
At the moment everything is in 16 bit mode and so I only have 6 general purpose registers to play with, with even fewer of those usable for accessing memory. I keep on trampling over my own registers which in turn means I'm frequently swapping registers around to avoid this - consequently I'm having a hard time keeping track of what registers contain what values, even with liberal commenting. Is this normal? Is there anything I can do to help make things easier to keep track of?
For example I've started commenting all of my functions with a list of the registers that are clobbered:
; ================
; c_lba_chs
; Converts logical block addressing to Cylinder / Head / Selector
; ax (input, clobbered) - LBA
; ch (output) - Track number (cylinder)
; cl (output) - Sector number
; dh (output) - Head number
; ================
Keeping track of the stack
In a couple of cases I've started using the stack when I run out of registers, but this is making things so much worse - anything more complex than a simple push call pop
sequence to preserve registers causes me to loose track completely, making it tricky to even tell if I've got the right number of items on the stack (particularly when error handling is involved - see below), let alone what order they are in. I know there must be a better way to use the stack, I just can't see what it is.
Handling errors
I've been using the carry flag and zero flag (depending on the function) to indicate an error to the caller, for example:
myfn:
; Do things
jz .error
; Do more things
ret
.error:
stc
ret
Is this a normal way of indicating errors?
Also are there any other hints or tricks that I can use to better structure my assembly?
Finally are there any good resources / examples of well-written assembly? I've come across The Art of Assembly Language Programming however it seems to focus very much on the nitty-gritty of the language with less emphasis on how code should be structured. (Also some of the code samples use segments, which I think I should be avoiding).
I'm doing all of this using zero segments (a flat memory model) to keep things simple and to make things easier if / when I start using C.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
别担心,您几乎走在正确的轨道上。作为汇编,您可以做您想做的事情,因此您可以自由决定如何管理寄存器和数据。我建议您自己开发一些标准,并且使用类似 C 的标准可能不是一个坏主意。我还建议对这样的第一个项目使用不同的汇编语言(例如在 qemu 上运行的 ARM),x86 作为指令集有点可怕。但这是一个单独的主题...
汇编器通常让您声明变量(如果您愿意),带有名称的内存:
然后从汇编器(此处使用ARM asm)
临时使用寄存器,真正的数据保存在内存中。像这样的模型可以帮助跟踪事物,因为真实数据与用户创建的变量名称一起保存在内存中,就像高级语言一样。 x86 使这变得更加容易,因为您可以在内存上执行操作,而不必通过寄存器来处理所有事情。同样,您可以使用局部变量的堆栈帧来管理它,从堆栈中减去一些数字以覆盖该函数的堆栈帧,并且在该函数中知道/记住变量 joe 是堆栈指针 +4,而 ted 是堆栈指针 +8,等等。可能会在代码中使用注释来记住这些东西在哪里。请记住在返回之前将堆栈指针/帧恢复到其入口点。此方法有点困难,因为您不使用变量名称而是使用数字偏移量。但提供局部变量和递归和/或一些全局内存节省。
作为一个人用眼睛和手(键盘和鼠标)完成这项工作,您可能希望将数据保存在寄存器中的时间不超过文本编辑器屏幕上一次可以容纳的时间,一目了然地看到变量转到然后寄存器就一目了然地返回到内存中的变量。程序/编译器当然可以跟踪系统中尽可能多的内存,远远大于人类。这就是为什么编译器平均生成比人类更好的汇编程序(特定情况下人类总是可以调整或修复问题)。
错误处理,您需要小心使用标志,由于某种原因,我觉得这不对。这可能很好,中断保留标志,您的代码都必须保留或设置标志等。嗯,标志的问题是您必须在函数返回之后立即检查/使用该返回值,然后再执行您有一条修改标志的指令。如果您使用寄存器,则在需要采样或使用该返回值之前,您可以选择不修改该返回寄存器以获得更多指令。
我认为这里的底线是,看看编译器用于该指令集(也许还有其他指令集)的 C 调用约定规则,您会看到强烈的相似性,并且有充分的理由。它们是可以管理的。使用如此少的寄存器,您可以明白为什么调用约定有时直接进入堆栈以获取所有参数,有时也直接进入堆栈。有人告诉我,Amiga BIOS 对每个 BIOS 函数都有一个自定义调用约定,这使得执行系统紧凑而快速,但是为什么尝试使用编译器在 C 中重新创建 BIOS,或者使用汇编器包装器附加到函数呢?充其量是困难的。我确信如果没有关于每个功能的良好文档,它是难以管理的。将来,您可能会决定需要这种便携式设备,并且可能希望选择了常用的调用约定。您仍然需要在代码中添加注释,说明参数 1 是这个,参数 2 是那个,等等。另一方面,如果您当前或过去编写过调用 DOS 和 BIOS 调用的 x86 汇编程序,您会很乐意查看将每个函数放在引用中,并将数据放入每个函数的正确寄存器中。因为有很好的参考资料,所以每个功能的定制都是可以做到的。
Dont worry, you are pretty much on the right track. Being assembly you can do what you want so you have the freedom to decide how you want to manage your registers and data. I would recommend developing some standard for yourself, and using a C like standard may not be a bad idea. I would also recommend using a different assembly language for a first project like this (for example ARM running on qemu), x86 is somewhat horriable as instruction sets go. but that is a separate topic...
The assemblers generally let you declare variables if you will, memory with names:
Then from assembler (using ARM asm here)
The registers are used temporarily, the real data is kept in memory. A model like this can help keep track of things as the real data is kept in memory with user created variable names just like a high level language. x86 makes this even easier as you can perform operations on memory and not have to go through registers for everything. Likewise you can manage this with a stack frame for local variables subtract some number from the stack to cover your stack frame for that function, and within that function know/remember that variable joe is stack pointer +4 and ted is stack pointer +8, etc. Probably use comments in your code to remember where these things are. Remembering to restore the stack pointer/frame to its entry point before returning. This method is a little harder as you are not using variable names but numerical offsets. But provides local variables and recursion and/or some global memory savings.
Doing this work as a human with your eyes and hands (keyboard and mouse) you probably want to keep data in a register no longer than what can fit on the screen on your text editor at one time, at a glance see the variable go to the register then return to the variable in memory all in one glance. A program/compiler certainly can keep track for as much memory as it has in the system, far greater than a human. Which is why compilers on average generate better assembler than humans (specific cases humans can always tweak or fix a problem).
Error handling, you need to be careful with using flags, it doesnt feel right to me for some reason. It may very well be just fine, interrupts preserve the flags, your code will all have to preserve or set the flags, etc. Hmm, the problem with flags is you have to check/use that return value immediately after the function returns, before you have an instruction that modifies the flags. where if you use a register you can choose not to modify that return register for many more instructions before you need to sample or use that return value.
I think the bottom line here is, look at the C calling convention rules that compilers use for that instruction set, and perhaps other instruction sets, you will see strong similarities and for good reason. They are manageable. With so few registers you can see why the calling conventions sometimes go straight to the stack for all of the arguments and sometimes the return values as well. I am told that the Amiga bios had a custom calling convention for each bios function, which made for a tight and fast executing system, but whey trying to re-create the bios in C using compilers and or attach to functions with an assembler wrapper it is difficult at best. I am sure without good documentation on each function, it is unmanageable. Down the road you may decide you want this portable and may wish you had chosen a commonly used calling convention. You still will want to comment your code to say parameter 1 is this and parameter 2 is that, etc. On the other hand if you are currently or in the past have programmed x86 assembler calling DOS and BIOS calls you would be quite comfortable with looking up each function in a reference, and placing the data in the proper registers for each function. Because there was good reference materials, it was manageable to have each function custom.