谁能告诉我纯粹的汇编代码,用于以小数格式显示寄存器的值?请不要建议使用printf hack,然后使用GCC编译。
描述:
嗯,我对NASM进行了一些研究并进行了一些实验,并认为我可以使用C库中的PrintF函数来打印整数。我这样做是通过用GCC编译器编译对象文件的,并且一切正常。
但是,我要实现的是打印以十进制形式的任何寄存器中存储的值。
我进行了一些研究,并认为DOS命令行的中断向量021H可以显示字符串和字符,而2或9位于AH寄存器中,并且数据在DX中。
结论:
我发现的一个示例均未显示如何以小数形式显示寄存器的内容值,而无需使用C库的printf。有人知道如何在集会中做到这一点吗?
Can anyone tell me the purely assembly code for displaying the value in a register in decimal format? Please don't suggest using the printf hack and then compile with gcc.
Description:
Well, I did some research and some experimentation with NASM and figured I could use the printf function from the c library to print an integer. I did so by compiling the object file with the GCC compiler and everything works fair enough.
However, what I want to achieve is to print the value stored in any register in the decimal form.
I did some research and figured the interrupt vector 021h for DOS command line can display strings and characters whilst either 2 or 9 is in the ah register and the data is in the dx.
Conclusion:
None of the examples I found showed how to display the content value of a register in decimal form without using the C library's printf. Does anyone know how to do this in assembly?
发布评论
评论(4)
您需要编写一个二进制以进行十进制转换程序,然后使用小数位数来产生“数字字符”以打印。
您必须假设某个地方会在您选择的输出设备上打印一个字符。称此子例程为“ print_character”;假设它在eax中采用字符代码并保留所有寄存器。(如果您没有这样的子例程,则有一个其他问题,应该是另一个问题的基础)。
如果您在寄存器(例如,eax)中具有数字的二进制代码(例如,值为0-9),则可以通过添加“零”字符的ASCII代码将该值转换为数字的字符到寄存器。这很简单:
然后您可以调用print_character以打印数字字符代码。
要输出任意值,您需要挑选数字并打印它们。
从根本上挑选数字需要使用十大的力量。最容易使用十大功能,例如10本身。想象一下,我们有一个分裂的逐日习惯,它在EAX中具有价值,并在EDX中产生了商,并在EAX中产生了其余部分。我将其作为练习,让您弄清楚如何实施这种例行程序。
然后,一个简单的例程正确的想法是为该值可能拥有的所有数字产生一个数字。 32位寄存器将值存储至40亿,因此您可能会得到10位印刷。因此:
这有效...但是以相反的顺序打印数字。哎呀!好吧,我们可以利用下降堆栈来存储生产的数字,然后以相反的顺序弹出:
将其作为练习给读者:抑制领先的零。另外,由于我们将数字字符写入内存,而不是将其写入堆栈,我们可以将它们写入缓冲区,然后打印缓冲区内容。也留给读者作为练习。
You need to write a binary to decimal conversion routine, and then use the decimal digits to produce "digit characters" to print.
You have to assume that something, somewhere, will print a character on your output device of choice. Call this subroutine "print_character"; assumes it takes a character code in EAX and preserves all the registers.. (If you don't have such a subroutine, you have an additional problem that should be the basis of a different question).
If you have the binary code for a digit (e.g., a value from 0-9) in a register (say, EAX), you can convert that value to a character for the digit by adding the ASCII code for the "zero" character to the register. This is as simple as:
You can then call print_character to print the digit character code.
To output an arbitrary value, you need to pick off digits and print them.
Picking off digits fundamentally requires working with powers of ten. It is easiest to work with one power of ten, e.g., 10 itself. Imagine we have a divide-by-10 routine that took a value in EAX, and produced a quotient in EDX and a remainder in EAX. I leave it as an exercise for you to figure out how to implement such a routine.
Then a simple routine with the right idea is to produce one digit for all digits the value might have. A 32 bit register stores values to 4 billion, so you might get 10 digits printed. So:
This works... but prints the digits in reverse order. Oops! Well, we can take advantage of the pushdown stack to store digits produced, and then pop them off in reverse order:
Left as an exercise to the reader: suppress leading zeros. Also, since we are writing digit characters to memory, instead of writing them to the stack we could write them to a buffer, and then print the buffer content. Also left as an exercise to the reader.
您需要手动将二进制整数变成ASCII小数位数的字符串/数组。 ascii数字由范围内的1个字节整数表示
'0'0'
(0x30)到'9'
(0x39)。 http://www.asciatible.com/有关Hex的Power-of-2基础,请参见如何将二进制整数号转换为十六进制字符串? 2基底底座可以进行更多的优化和简化,因为每组位映射分别为十六进制 /八分位数。
大多数操作系统 /环境没有接受整数并将其转换为小数的系统调用。您必须自己执行此操作,然后再将字节发送到OS,或将其复制到视频内存中,或在视频内存中绘制相应的字体字形...
到目前为止,最有效的方法是进行单个系统调用整个字符串一次,因为编写8个字节的系统调用基本上与编写1个字节相同。
这意味着我们需要一个缓冲区,但这根本不会增加我们的复杂性。 2^32-1仅为4294967295,只有10个小数位数。我们的缓冲区不需要大,因此我们可以使用堆栈。
通常的算法会产生数字LSD-First(首先是数字最低)。由于打印顺序是MSD-优先,因此我们可以从缓冲区的末端开始,然后向后工作。要在其他地方打印或复制,只需跟踪它的开始的位置即可,不要为将其设置为固定缓冲区的开始即可。无需弄乱推送/弹出即可扭转任何东西,只需首先将其向后产生即可。
gcc/clang做得很好,使用魔术常数乘数而不是
div
有效除以10。 ( godbolt编译器资源管理器用于ASM输出)。this code-review q& a 将字符串累积到8字节寄存器而不是内存中的内容,您希望字符串启动而无需额外复制。
处理签名的整数:
在未符号的绝对值上使用此算法。 (
val = val< 0?0u-val:val; val;
,即xor-Zero/sub
/cmovs
,可以使原始值保持周围;这是一个简单的评论的NASM版本,使用
div
(缓慢但较短的代码),适用于32位未签名的整数和Linuxwrite
System Call。 仅通过将寄存器更改为ecx
而不是rcx
,就可以轻松地将其移植到32位模式代码。但是添加RSP,24
将变成添加ESP,20
,因为push ecx
只有4个字节,而不是8个。(您还应保存/还原/还原ESI
对于通常的32位调用约定,除非您将其用于宏或内部使用功能。)System-call零件特定于64位Linux。将其替换为适合您的系统的任何内容,例如,在32位Linux上调用VDSO页面以进行有效的系统调用,或直接使用
int 0x80
直接用于效率低下的系统调用。请参阅呼叫32和32和32的公约64位系统在UNIX/Linux上调用。或参见在32位int 0x80
版本的另一个问题上,以相同的方式工作。如果您只需要不打印字符串, rsi 在离开循环后指向第一个数字。您可以将其从TMP缓冲区复制到实际需要的任何地方的开始。或者,如果您将其直接生成最终目的地(例如通过指针ARG),则可以与领先的零一起使用,直到到达剩下的空间的前面为止。除非您始终将零以达到固定宽度,否则没有简单的方法可以在开始之前找出要有多少位数字。
公共领域。随意将其复制/粘贴到您正在从事的任何工作中。如果破裂,您就可以保留这两块。 (如果性能很重要,请参见下面的链接;您需要一个乘法倒数而不是
div
。),此处的代码以循环计数为0(包括0)。将其放在同一文件中很方便。
与使用
strace
组装并链接,以查看该程序制作的唯一系统调用是
write()
andexit()
。 (另请参见 x86 tag wiki和其他链接。)
int 0x80
with
在最后调用。几乎相同的循环。printf
- 如何在汇编中打印一个数字? x86-64和i386答案。div
与编译器生成的代码的基准,使用mul
。答案
neon simd在apple m1上
和一些较旧的文章:如何打印整数非常快博客文章比较了C中的某些策略。
例如
x%100
以创建更多的ILP(指令级别的并行性),以及查找表或更简单的乘法逆(仅必须在有限的范围内工作,例如 this Answer )以将0..99剩余时间分解为2个小数位。例如,使用
(x * 103)>> 10
使用一个imul r,r,imm8
/shr r,10
,如另一个答案所示。可能以某种方式将其折叠到其余的计算本身中。类似的文章。
You need to turn a binary integer into a string/array of ASCII decimal digits manually. ASCII digits are represented by 1-byte integers in the range
'0'
(0x30) to'9'
(0x39). http://www.asciitable.com/For power-of-2 bases like hex, see How to convert a binary integer number to a hex string? Converting between binary and a power-of-2 base allows many more optimizations and simplifications because each group of bits maps separately to a hex / octal digit.
Most operating systems / environments don't have a system call that accepts integers and converts them to decimal for you. You have to do that yourself before sending the bytes to the OS, or copying them to video memory yourself, or drawing the corresponding font glyphs in video memory...
By far the most efficient way is to make a single system call that does the whole string at once, because a system call that writes 8 bytes is basically the same cost as writing 1 byte.
This means we need a buffer, but that doesn't add to our complexity much at all. 2^32-1 is only 4294967295, which is only 10 decimal digits. Our buffer doesn't need to be large, so we can just use the stack.
The usual algorithm produces digits LSD-first (Least Significant Digit first). Since printing order is MSD-first, we can just start at the end of the buffer and work backwards. For printing or copying elsewhere, just keep track of where it starts, and don't bother about getting it to the start of a fixed buffer. No need to mess with push/pop to reverse anything, just produce it backwards in the first place.
gcc/clang do an excellent job, using a magic constant multiplier instead of
div
to divide by 10 efficiently. (Godbolt compiler explorer for asm output).This code-review Q&A has a nice efficient NASM version of that which accumulates the string into an 8-byte register instead of into memory, ready store where you want the string to start without extra copying.
To handle signed integers:
Use this algorithm on the unsigned absolute value. (
val = val<0 ? 0U-val : val;
, i.e. xor-zero /sub
/cmovs
which keeps the original value around; Godbolt). If the original input was negative, stick a'-'
in front at the end, when you're done. So for example,-10
runs this with10
, producing 2 ASCII bytes. Then you store a'-'
in front, as a third byte of the string.Here's a simple commented NASM version of that, using
div
(slow but shorter code) for 32-bit unsigned integers and a Linuxwrite
system call. It should be easy to port this to 32-bit-mode code just by changing the registers toecx
instead ofrcx
. Butadd rsp,24
will becomeadd esp, 20
becausepush ecx
is only 4 bytes, not 8. (You should also save/restoreesi
for the usual 32-bit calling conventions, unless you're making this into a macro or internal-use-only function.)The system-call part is specific to 64-bit Linux. Replace that with whatever is appropriate for your system, e.g. call the VDSO page for efficient system calls on 32-bit Linux, or use
int 0x80
directly for inefficient system calls. See calling conventions for 32 and 64-bit system calls on Unix/Linux. Or see rkhb's answer on another question for a 32-bitint 0x80
version that works the same way.If you just need the string without printing it,
rsi
points to the first digit after leaving the loop. You can copy it from the tmp buffer to the start of wherever you actually need it. Or if you generated it into the final destination directly (e.g. pass a pointer arg), you can pad with leading zeros until you reach the front of the space you left for it. There's no simple way to find out how many digits it's going to be before you start unless you always pad with zeros up to a fixed width.Public domain. Feel free to copy/paste this into whatever you're working on. If it breaks, you get to keep both pieces. (If performance matters, see the links below; you'll want a multiplicative inverse instead of
div
.)And here's code to call it in a loop counting down to 0 (including 0). Putting it in the same file is convenient.
Assemble and link with
Use
strace
to see that the only system calls this program makes arewrite()
andexit()
. (See also the gdb / debugging tips at the bottom of the x86 tag wiki, and the other links there.)Related:
int 0x80
for thewrite
system call at the end. Pretty much the same loop.printf
- How to print a number in assembly NASM? has x86-64 and i386 answers.div
vs. compiler-generated code usingmul
.High-performance versions
Some optimized decimal atoi versions from Daniel Lemire's blog: without AVX-512, and much faster with AVX-512 IFMA
With NEON SIMD on Apple M1
and some older articles: How to print integers really fast blog post comparing some strategies in C.
Such as
x % 100
to create more ILP (Instruction Level Parallelism), and either a lookup table or a simpler multiplicative inverse (that only has to work for a limited range, like in this answer) to break up the 0..99 remainder into 2 decimal digits.e.g. with
(x * 103) >> 10
using oneimul r,r,imm8
/shr r,10
as shown in another answer. Possibly somehow folding that in to the remainder calculation itself.https://tia.mat.br/posts/2014/06/23/integer_to_string_conversion.html a similar article.
无法发表评论,所以我发布了以这种方式回复。
@ira baxter,完美的答案,我只想补充说,您不需要在您发布的10次将注册CX设置为10次时分配10次。只需在ax中划分ax == 0
。原始数字中有多少位数字。
无论如何,您ira baxter帮助了我,只有几种方法可以如何优化代码:)
这不仅是关于优化的,而且是格式化的。当您要打印数字54时,您想要打印54不是0000000054 :)
Can't comment so I post reply this way.
@Ira Baxter, perfect answer I just want to add that you don't need to divide 10 times as you posted that you set register cx to value 10. Just divide number in ax until "ax==0"
You also have to store how many digits was there in original number.
Anyway you Ira Baxter helped me there is just few ways how to optimize code :)
This is not only about optimization but also formatting. When you want to print number 54 you want print 54 not 0000000054 :)
1 -9是1 -9。之后,我也必须进行一些转换。假设您在AX(EAX)中有41H,并且您想在不进行服务电话的情况下打印一个65而不是“ A”。我认为您需要打印6和5的角色表示。必须添加一个恒定数字才能到达那里。您需要一个模量运算符(但要在汇编中这样做),并且要循环所有数字。
不确定,但这是我的猜测。
1 -9 are 1 -9. after that, there must be some conversion that I don't know either. Say you have a 41H in AX (EAX) and you want to print a 65, not 'A' without doing some service call. I think you need to print a character representation of a 6 and a 5 whatever that might be. There must be a constant number that can be added to get there. You need a modulus operator (however you do that in assembly) and loop for all digits.
Not sure, but that's my guess.