该程序集访问该字符串常量有什么问题?

发布于 2024-11-05 16:28:33 字数 1393 浏览 1 评论 0原文

我以为我开始明白发生了什么,但我现在花了很长时间试图理解为什么以下不起作用:

org 0x7C00

mov ax,0x0000
mov ds,ax

mov si, HelloWorld

HelloWorld db 'Hello World',13,10,0

我期望的是 mov si, HelloWorld 指令会将值 0x7C08 放入 si(即 0x7c00 + HelloWorld 的偏移量),为类似的事情做好准备lodsb

当我构建这个(使用 Nasm)并运行它(使用 Bochs)时,我发现最终指令实际上看起来像这样:

mov si, 0x8400

为什么会这样,值 0x8400 来自哪里?

更新:我发现将 HelloWorld 放在数据段中会产生预期的输出:

section .data
HelloWorld db 'Hello World',13,10,0

这是为什么?

仅供参考,用于构建此命令的命令是 nasm -f bin input.asm -o output.bin

Update 2 我已经注意到 0x84000x7c00 + 0x0800,其中 8 是 HelloWorld 距输出开头的偏移量 - 当我在使用 org 时发现这一点时,我注意到了这一点0 使用的地址是0x0800

我仍然不明白发生了什么事——发现这一点让我更加困惑!

根据要求,使用 ndisasm 进行反汇编:

00000000  B80000            mov ax,0x0
00000003  8ED8              mov ds,ax
00000005  BE0084            mov si,0x8400
00000008  48                dec ax
00000009  656C              gs insb
0000000B  6C                insb
0000000C  6F                outsw
0000000D  20576F            and [bx+0x6f],dl
00000010  726C              jc 0x7e
00000012  640D0A00          fs or ax,0xa

I thought I was starting to understand whats going on, but I've been spending ages now trying to understand why the following doesn't work:

org 0x7C00

mov ax,0x0000
mov ds,ax

mov si, HelloWorld

HelloWorld db 'Hello World',13,10,0

What I'm expecting is that the mov si, HelloWorld instruction will place the value 0x7C08 in si (which is 0x7c00 + the offset of HelloWorld), ready for things like lodsb.

When I build this (using Nasm) and run it (using Bochs) I find up that the end instruction actually looks like this:

mov si, 0x8400

Why is this, and where has the value 0x8400 come from?

Update: I've discovered that placing HelloWorld in the data segment produces the expected output:

section .data
HelloWorld db 'Hello World',13,10,0

Why is this?

FYI the command used to build this is nasm -f bin input.asm -o output.bin

Update 2 I've twigged that 0x8400 is 0x7c00 + 0x0800, where 8 is the offset of HelloWorld from the beginning of the output - I noticed this when I spotted that when using org 0 the address used is 0x0800.

I still don't understand whats going on though - spotting this has just made me more confused!

As requested, disassembly using ndisasm:

00000000  B80000            mov ax,0x0
00000003  8ED8              mov ds,ax
00000005  BE0084            mov si,0x8400
00000008  48                dec ax
00000009  656C              gs insb
0000000B  6C                insb
0000000C  6F                outsw
0000000D  20576F            and [bx+0x6f],dl
00000010  726C              jc 0x7e
00000012  640D0A00          fs or ax,0xa

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

捶死心动 2024-11-12 16:28:33

升级您的 nasm 副本。

使用 nasm 2.09rc1 我得到以下(意外的)反汇编:

00000000  B80000            mov ax,0x0
00000003  8ED8              mov ds,ax
00000005  BE0084            mov si,0x8400
00000008  48                dec ax
00000009  656C              gs insb
0000000B  6C                insb
0000000C  6F                outsw
0000000D  20576F            and [bx+0x6f],dl
00000010  726C              jc 0x7e
00000012  640D0A00          fs or ax,0xa

使用 nasm 2.09.08 我得到以下(预期的)反汇编:

00000000  B80000            mov ax,0x0
00000003  8ED8              mov ds,ax
00000005  BE087C            mov si,0x7c08
00000008  48                dec ax
00000009  656C              gs insb
0000000B  6C                insb
0000000C  6F                outsw
0000000D  20576F            and [bx+0x6f],dl
00000010  726C              jc 0x7e
00000012  640D0A00          fs or ax,0xa

我猜它是一个候选版本是有原因的......:)

Upgrade your copy of nasm.

Using nasm 2.09rc1 I get the following (unexpected) disassembly:

00000000  B80000            mov ax,0x0
00000003  8ED8              mov ds,ax
00000005  BE0084            mov si,0x8400
00000008  48                dec ax
00000009  656C              gs insb
0000000B  6C                insb
0000000C  6F                outsw
0000000D  20576F            and [bx+0x6f],dl
00000010  726C              jc 0x7e
00000012  640D0A00          fs or ax,0xa

Using nasm 2.09.08 I get the following (expected) disassembly:

00000000  B80000            mov ax,0x0
00000003  8ED8              mov ds,ax
00000005  BE087C            mov si,0x7c08
00000008  48                dec ax
00000009  656C              gs insb
0000000B  6C                insb
0000000C  6F                outsw
0000000D  20576F            and [bx+0x6f],dl
00000010  726C              jc 0x7e
00000012  640D0A00          fs or ax,0xa

I guess it was a release candidate for a reason... :)

绝影如岚 2024-11-12 16:28:33

除非您使用 bin 格式,否则 nasm 可以将数据移动到段 .data 中,这在编译为 PE 格式(例如 .EXE)时非常有意义。

换句话说,一旦输出二进制文件被布局并链接,您确定0x8400不是正确的地址吗?我知道您正在尝试在 segment.text 中发出数据 - 为此,我认为您需要 bin 指令。

编辑:

鉴于您使用的是 bin 格式,并考虑到在 segment.data 中构建 HelloWorld 字符串确实有效的附加信息,我怀疑你需要做的是:

lea si, [cs:HelloWorld]

我可能不明白语法——自从我用 16 位 x86 编码以来已经有很多年了——但重点是你得到的偏移量是基于对 < 值的假设code>ds,您正在显式清除它,汇编器可能会假定它具有 segment.code 或类似值。 (感谢 Aaron 将我的 mov 更正为 lea。)

Unless you use bin format, nasm is allowed to move your data into a segment .data This makes a lot of sense when compiling to a PE format such as .EXE.

In other words, are you certain that 0x8400 is not the proper address once the output binary has been laid out and linked? I understand you are trying to emit data in the segment .text -- to do that, I think you need the bin directive.

Edit:

Given that you are using the bin format, and considering your additional information that building the HelloWorld string in segment .data does work, I suspect what you need to do is:

lea si, [cs:HelloWorld]

I may be off on the syntax -- it's been years since I coded in 16-bit x86 -- but the point is that you're getting an offset based on an assumption about the value of ds, which you are explicitly clearing and which the assembler might assume has the value of segment .code or similar. (Thanks to Aaron for correcting my mov to an lea.)

来自 MASM 帮助:

第一个包含代码的目标文件
应该以
像RESB 100h这样的线。这是为了
确保代码从偏移量开始
相对于开始时间 100 小时
代码段,以便链接器或
转换器程序不必
调整内的地址引用
生成 .COM 文件时的文件。
其他汇编器使用 ORG 指令
为此目的,但 NASM 中的 ORG 是
bin 的特定格式指令
输出格式,并不意味着
与它相同的事情
兼容 MASM 的汇编器。

因此,您有代码段 CS 和数据段 DS 并且它们不相等,因此标签指针也不同,具体取决于部分。
在 x86 下,节对齐通常为 4096 字节,适合内存页的大小。

From MASM help:

The first object file containing code
should start its code segment with a
line like RESB 100h. This is to
ensure that the code begins at offset
100h relative to the beginning of the
code segment, so that the linker or
converter program does not have to
adjust address references within the
file when generating the .COM file.
Other assemblers use an ORG directive
for this purpose, but ORG in NASM is a
format-specific directive to the bin
output format, and does not mean the
same thing as it does in
MASM-compatible assemblers.

So, you have code segment CS and data segment DS and they are not equal, therefor also label pointers are different, depend of section.
Under x86 the section alignment is usually 4096 bytes which fit the size of a memory page.

季末如歌 2024-11-12 16:28:33

嗯...“H”是 0x48。也许您正在提取“Hello World”的第一个字节而不是它的地址。

Hmm... 'H' is 0x48. Maybe you're pulling the first byte of 'Hello World' instead of the address of it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文