在原始 8086 上将寄存器设置为 1 或 (-1) 的最有效方法

发布于 2024-09-01 01:25:22 字数 411 浏览 15 评论 0原文

我现在正在上一门汇编课程,检查我们家庭作业的那个人是一个非常迂腐的老派优化狂。例如,如果他看到:

mov ax, 0

而不是:,

xor ax,ax

即使只使用一次,

他也会扣除 1​​0%。我不是汇编编程的完全初学者,但我不是优化专家,所以我需要您的帮助(可能是一个非常愚蠢的问题,但无论如何我都会问): 如果我需要将寄存器值设置为 1 或 (-1),最好使用:

mov ax, 1

或执行类似以下操作:

xor ax,ax
inc ax

我确实需要一个好的成绩,所以我试图尽可能优化它。 (我需要优化时间和代码大小)

I am taking an assembly course now, and the guy who checks our home assignments is a very pedantic old-school optimization freak. For example he deducts 10% if he sees:

mov ax, 0

instead of:

xor ax,ax

even if it's only used once.

I am not a complete beginner in assembly programing but I'm not an optimization expert, so I need your help in something (might be a very stupid question but I'll ask anyway):
if I need to set a register value to 1 or (-1) is it better to use:

mov ax, 1

or do something like:

xor ax,ax
inc ax

I really need a good grade, so I'm trying to get it as optimized as possible. ( I need to optimize both time and code size)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

樱花落人离去 2024-09-08 01:25:22

快速谷歌搜索 8086 指令计时大小,发现 指令时序列表,其中似乎包含 8086/8088 到 Pentium 的所有时序和大小。

尽管您应该注意,这可能不包括代码获取内存瓶颈,这可能非常重要,尤其是在 8088 上。这通常使代码大小优化成为更好的选择。请参阅此处了解有关此内容的一些详细信息。

毫无疑问,您可以在网上找到包含类似信息的英特尔官方文档,例如“8086/8088 用户手册:程序员和硬件参考”。

对于您的特定问题,下表给出了比较,表明后者更好(周期更少,空间相同):

指令时钟周期字节
xor ax, ax
公司斧头
3
3
---
6
2
1
---
3
mov axe, 143

但是您可能想与您的教育机构讨论这个人。这么简单的事情就被罚 10% 似乎很严厉。您应该询问在有两种可能性的情况下应该做什么,一种更快,一种更短。

然后,一旦他们承认根据您想要实现的目标,有不同的方法来优化代码,请告诉他们想要做的是优化可读性和可维护性,并且真的不在乎这里或那里浪费的周期或字节(1)

如果您遇到性能问题,在一段代码处于接近完成的状态之后,您通常会进行优化 - 当代码仍然受到非-的影响时,它几乎总是浪费精力。变化的可能性微乎其微。

就其价值而言,sub ax,ax 在时钟周期和大小方面似乎与 xor ax,ax 相当,因此也许您可以将其混入其中下次给他带来更多的工作。


(1)不,不要真的这么做,但偶尔发泄一下还是很有趣的:-)

A quick google for 8086 instructions timings size turned up a listing of instruction timings which seems to have all the timings and sizes for the 8086/8088 through Pentium.

Although you should note that this probably doesn't include code fetch memory bottlenecks which can be very significant, especially on an 8088. This usually makes optimization for code-size a better choice. See here for some details on this.

No doubt you could find official Intel documentation on the web with similar information, such as the "8086/8088 User's Manual: Programmer's and Hardware Reference".

For your specific question, the table below gives a comparison that indicates the latter is better (less cycles, and same space):

InstructionsClock cyclesBytes
xor ax, ax
inc ax
3
3
---
6
2
1
---
3
mov ax, 143

But you might want to talk to your educational institute about this guy. A 10% penalty for a simple thing like that seems quite harsh. You should ask what should be done in the case where you have two possibilities, one faster and one shorter.

Then, once they've admitted that there are different ways to optimise code depending on what you're trying to achieve, tell them that what you're trying to do is optimise for readability and maintainability, and seriously couldn't give a damn about a wasted cycle or byte here or there(1).

Optimisation is something you generally do if and when you have a performance problem, after a piece of code is in a near-complete state - it's almost always wasted effort when the code is still subject to a not-insignificant likelihood of change.

For what it's worth, sub ax,ax appears to be on par with xor ax,ax in terms of clock cycles and size, so maybe you could throw that into the mix next time to cause him some more work.


(1)No, don't really do that , but it's fun to vent occasionally :-)

多情癖 2024-09-08 01:25:22

会更好

在 8086 上使用mov AX,1

。如果您正在跟踪寄存器内容,如果您知道,例如 BX 中已经有 1,则可能会做得更好:

mov AX,BX

或者如果您知道 AH 为 0:

mov AL,1

等。

You're better off with

mov AX,1

on the 8086. If you're tracking register contents, you can possibly do better if you know that, for example, BX already has a 1 in it:

mov AX,BX

or if you know that AH is 0:

mov AL,1

etc.

蓝海 2024-09-08 01:25:22

根据您的具体情况,您可能能够逃脱……

 sbb ax, ax

如果未设置进位标志,则结果将为 0;如果设置了进位标志,则结果将为 -1。

但是,如果上面的示例不适合您的情况,我会推荐该

xor  ax, ax
inc  ax

方法。它的尺寸应该会让你的教授满意。但是,如果您的处理器使用任何管道,我预计两条指令之间会出现一些类似耦合的延迟(我很可能是错误的)。如果存在这种耦合,则可以通过稍微重新排序指令以在它们之间添加另一条指令(不使用 axe 的指令)来稍微提高速度。

希望这有帮助。

Depending upon your circumstances, you may be able to get away with ...

 sbb ax, ax

The result will either be 0 if the carry flag is not set or -1 if the carry flag is set.

However, if the above example is not applicable to your situation, I would recommend the

xor  ax, ax
inc  ax

method. It should satisfy your professor for size. However, if your processor employs any pipe-lining, I would expect there to be some coupling-like delay between the two instructions (I could very well be wrong on that). If such a coupling exists, the speed could be improved slightly by reordering your instructions slightly to have another instruction between them (one that does not use ax).

Hope this helps.

那一片橙海, 2024-09-08 01:25:22

在任何情况下我都会使用 mov [e]ax, 1 。它的编码并不比黑客的xor序列长,而且我很确定它在任何地方都更快。 8086 很奇怪,足以成为例外,而且由于它的速度太慢,像这样的微优化会产生最大的不同。但在其他任何地方:执行 2 条“简单”指令总是比执行 1 条慢,尤其是在考虑数据危险和长管道的情况下。您尝试在修改后的下一条指令中读取寄存器,因此除非您的 CPU 可以将结果从管道的第 N 级(xor 正在执行的位置)绕过到第 N 级-1(其中 inc 正在尝试加载寄存器,不用介意将其值加 1),您将会遇到停顿。

其他需要考虑的事情:取指令带宽(对于 16 位代码来说没有意义,都是 3 字节); mov 避免更改标志(比将它们全部强制为零更有用);根据其他寄存器可能保存的值,您也许可以执行 lea ax,[bx+1](也是 3 个字节,即使在 32 位代码中,对标志没有影响);正如其他人所说,sbb ax,ax 在某些情况下也可以工作 - 它也更短,为 2 个字节。

当面对这些类型的微观优化时,您确实应该衡量替代方案,而不是盲目依赖处理器手册。

PS 新作业:xor bx,bxxor bx,cx (在任何处理器上)更快吗?

I would use mov [e]ax, 1 under any circumstances. Its encoding is no longer than the hackier xor sequence, and I'm pretty sure it's faster just about anywhere. 8086 is just weird enough to be the exception, and as that thing is so slow, a micro-optimization like this would make most difference. But any where else: executing 2 "easy" instructions will always be slower than executing 1, especially if you consider data hazards and long pipelines. You're trying to read a register in the very next instruction after you modify it, so unless your CPU can bypass the result from stage N of the pipeline (where the xor is executing) to to stage N-1 (where the inc is trying to load the register, never mind adding 1 to its value), you're going to have stalls.

Other things to consider: instruction fetch bandwidth (moot for 16-bit code, both are 3 bytes); mov avoids changing flags (more likely to be useful than forcing them all to zero); depending on what values other registers might hold, you could perhaps do lea ax,[bx+1] (also 3 bytes, even in 32-bit code, no effect on flags); as others have said, sbb ax,ax could work too in circumstances - it's also shorter at 2 bytes.

When faced with these sorts of micro-optimizations you really should measure the alternatives instead of blindly relying even on processor manuals.

P.S. New homework: is xor bx,bx any faster than xor bx,cx (on any processor)?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文