指令集和汇编语言是同一个东西吗?
I was wondering if instruction set and assembly language are the same thing?
If not, how do they differ and what are their relations?
Thanks and regards!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
汇编语言不仅仅是指令集的超集:它是生成目标文件、符号、调试信息、链接的一种方式,甚至在这个级别上也可以进行一些最小的结构化编程。 (在某种程度上建立在其他答案/评论的基础上)
大多数 C 编译器生成程序集,然后将其传递给汇编器以创建目标文件。如果您查看使用标志“-S”运行时 gcc 的输出,您将看到上面的大部分内容都被使用。如果您打开了调试(“-g”)和任何动态链接(目前默认),您将看到大量不只用于指令的程序集。
An assembly language is more than just a superset of the instruction set: it's a way of generating object files, symbols, debug info, linkage, and also to have some minimal structured programming even at this level. (Somewhat building on other answers/comments here)
Most C compilers generate assembly, which is then passed to the assembler to create object files. If you look at the output of gcc when run with flag '-S', you'll see most of the above being used. If you have debug turned on ('-g') and any dynamic linkage (default these days) you'll see a huge amount of assembly not devoted to just instructions.
一切都采用分层架构,具有“严格(大多数时候)和定义良好的接口”。
从硬件开始
有很多层,直到到达处理器。
层我的意思是我们从“物理->设备(电子)->模拟(放大器)->门->数字电路->微架构->架构(ISA,处理器)< /p>
称为 ISA(指令集架构)的软件部分
它具有受人尊敬的处理器可以支持的所有指令。这意味着 ISA 只能绑定到一个处理器(x86 等硬件)。
重要的是为什么需要这个 ISA?
正如我之前所说,它是严格且定义良好的界面。处理器无法运行任何超出 ISA 的指令[严格]
但是任何想要使用该处理器的人都可以使用 ISA 的这些命令来完成他的工作。 [明确定义的接口]
现在来看看汇编、C、汇编器、编译器......
分层架构你知道我们在硬件中使用它(分层架构)为您实现一个处理器
您可以阅读有关为什么采用分层架构的更多信息。一步步解决大问题很容易。
这和我们想要的一样吗?我们的目标是什么?
我们希望用户可以轻松使用该处理器。
这里的用户是程序员。
现在看看程序员的难度吧。
程序员可以记住处理器的所有二进制格式指令吗?并且处理器可能会在下一个应用程序中从 Intel 更改为 IBM(现在不是特定版本)。
汇编器也是一个层,它有 - 两个接口。
与编译器相同。
例如:你用 C 语言编写了一段代码。处理器无法理解这段代码。它理解以二进制格式编写并由 ISA 给出的指令定义的任何内容。但在ISA指令中编写(维护)程序是很困难的。
1)所以用户用C编写代码。
C 编译器可以理解这段代码。因为用户被限制只能使用 C 中给出的语法。这意味着 C 编译器在一端为用户提供了一个标准且定义良好的接口。另一方面,它可以直接使用ISA指令或另一个称为“汇编器”的接口。
2) 现在如果您使用汇编器,那么编译器会将所有 C 代码转换为汇编器给出的语法。而汇编器提供给编译器的语法称为汇编语言。它也是定义良好的接口,任何人都可以使用它以汇编语言进行编程。另一方面,汇编器将其所有语法(助记符|指令,这些在 ISA 中不存在)转换为 ISA 中的二进制代码指令。
这里是这个翻译的一些例子。
在此文件中,一行是“机器:Advanced Micro Devices X86-64” 那是提供有关处理器的信息,相应地我们正在使用 ISA 和汇编器。而 C 程序员并没有意识到这一点,他可以自由地用 C 编写代码。这就是“良好定义的接口”的好处。
比较请参见
hello.c(C 程序)
hello.asm2bin(目标文件表:直接映射助记符和二进制指令)< /p>
hello.asm2bin_exe(二进制文件表:链接后更多映射)
您将在这些文件中看到一行“反汇编部分..”
由于汇编器的作用是:它从汇编语言汇编 ISA 指令(位模式),所以在这里我们首先看到 ISA 指令,然后反汇编为助记符。
所有文件都在此链接 [下载并打开]
https://www.dropbox.com/sh/v2moak4ztvs5vb7/AABRTxl7KQlqU2EkkMkKssqYa?dl=0
Everything is in layered Architecture with "Strict (most of the time) and Well defined interfaces".
Start From Hardware
There are many layers until you reach up to processor.
Layer I mean we start from "physics->devices(electronics)->Analog(Amplifier)->Gates->Digital Circuits-> Micro-Architecture->Architecture(ISA, Processor)
Software part that is called ISA (Instruction Set Architecture)
It has all instructions that a respected processor can support. It means ISA is bound to only one processor (Hardware like x86).
Important thing is why this ISA is required ?
Same as I told earlier it is Strict and Well Defined Interface. Processor can not run any instruction beyond ISA [Strict]
But Any one who want to use this processor can use these commands from ISA to get his work done. [Well Defined Interface]
Now come to Assembly, C, Assembler, Compiler ....
Layered Architecture you know we use it(Layered Arch) in Hardware to implement one processor for you
You can read more about why this Layered architecture. It make easy to deal with a big problem step by step.
Same here what we want? what our goal is ?
We want user can use this processor easily.
Here user is programmer.
Now See the difficulty for programmer.
Can a programmer remember all instruction for a processor those are in binary format. And processor may change in next application from Intel to IBM (not version specific now).
Assembler is also a layer what it has - two interfaces.
Same with Compiler.
Ex: You write a code in C. Processor can not understand this code. It understand whatever written in binary format and defined by instruction given in ISA. But it is difficult to write(maintain|modify) a program in instruction in ISA.
1) So User write a code in C.
This code a C-compiler understand. Because a user is restricted to use only syntax given in C. That means C-compiler giving a standard and well defined interface to user at one end. At the other and it can use directly ISA instruction or Another interface called "Assembler".
2) Now If you are using Assembler then Compiler will translate all C-Code in to Syntax given by Assembler. And the syntax that Assembler provide to Compiler called assembly language. It is also well defined interface and any one can use it to program in Assembly language. And at the other end Assembler converts all its syntax(Mnemonics|Directives, those are not present in ISA) to binary code instructions in ISA.
Here Some example of this translation.
In this file one line is "Machine: Advanced Micro Devices X86-64" that is providing information about processor accordingly we are using ISA and assembler. And C programmer is not aware of this, he is free to code in C. That is the benefit of "Well Defined Interface".
To Compare Just See
hello.c (C program)
hello.asm2bin (Object File Table: direct mapping Mnemonics and Binary Instruction)
hello.asm2bin_exe (Binary File Table: More mapping after linking)
You will see one line in these files "Disassembly of section .."
Since what assembler do : It assemble the ISA instruction(Bit pattern) from assembly language, So here we are seeing first ISA instruction and then desassembly to Mnemonics.
All files are at this link [Download and Open]
https://www.dropbox.com/sh/v2moak4ztvs5vb7/AABRTxl7KQlqU2EkkMkKssqYa?dl=0
当您查看问题中链接到的关于 汇编语言 的维基百科文章时,有一个示例下面显示了汇编语言指令和相应的目标代码。两者都是同一事物的不同表示:来自处理器指令集的指令。但只有标题为“指令(AT&T 语法)”的列包含汇编语言。
希望这能让它更清楚。
When you look into the Wikipedia article on Assembly language you linked to in your question, there is an example below showing assembly language instructions and corresponding object code. Both are different representations of the same thing: instructions from a processor's instruction set. But only the column with the title "Instruction (AT&T syntax)" contains assembly language.
Hope this makes it clearer.
我想每个人都会给你同样的答案。指令集是处理器可以执行或理解的所有指令的集合(如数学)。汇编语言是一种编程语言。
让我根据您提出的一些问题尝试一些例子。我将使用我手边的任何代码从一个处理器跳到另一个处理器。
指令或操作码或二进制或机器语言,无论您想使用什么术语来表示加载到处理器中进行解码和执行的位/字节。 一个例子
汇编语言的
是针对这个特定的处理器。在这种情况下,这意味着 r11 = r11 + r12。因此,我将该文本(添加 r12,r11)放入文本文件中,并使用汇编程序(编译/汇编汇编语言的程序)将其汇编成某种形式的二进制文件。与任何编程语言一样,有时您创建目标文件然后将它们链接在一起,有时您可以直接进入二进制文件。二进制文件有很多种形式,有 ascii 和二进制形式,还有其他的讨论。
现在,您可以在不属于指令集的汇编程序中做什么?它们有何不同?对于初学者来说,您可以使用宏:
宏就像内联函数,它们不是被调用的函数,而是生成内联代码。例如,与 C 宏没有什么不同。因此,您可能会使用它们来节省一些打字,或者您可能会使用它们来抽象一些您想要一遍又一遍地执行的操作,并且希望能够在一个地方进行更改,而不必接触每个实例。上面的例子本质上生成了这个:
指令集和汇编语言之间的另一个区别是伪指令,例如对于这个特定的指令集,没有用于从堆栈中弹出内容的弹出指令,至少不是通过该名称,我将解释原因。但是您可以节省一些输入并在代码中使用弹出操作:
没有弹出操作的原因是因为寻址模式足够灵活,可以从源寄存器中的地址读取数据,然后将值放入目标寄存器中,然后将源寄存器增加一个字。在该指令集的汇编器中,
pop 和 mov 结果都是操作码 0x413C。
指令集和汇编程序之间的差异的另一个例子,即切换指令集,是这样的:
对于该汇编语言来说,这意味着将 bob 的地址加载到寄存器 0 中,没有相应的指令,汇编程序用它做的是生成如果您要在汇编程序中手动编写它,则看起来像这样:
本质上,在该指令可到达的位置(而不是在执行路径中)创建一个字,链接器将用 bob 的地址填充该字。同样,汇编器或链接器的 ldr 指令将使用 PC 相关指令的 ldr 进行编码。
这导致了指令集和汇编语言之间的一系列差异,
机器代码无法知道什么是乐趣或在哪里可以找到它。对于这个具有多种寻址模式的指令集(请注意,我特别有意避免命名我正在使用的指令集,因为这与讨论无关),汇编器或链接器视情况而定(取决于 fun 函数结束的位置) up 与此指令相关)。
汇编器可以选择将该指令编码为 pc 相关指令,如果 fun 函数在 call 指令之前 40 个字节,则可以使用相当于 call pc+36 的方式对其进行编码(去掉 4 个字节,因为 pc 在执行时领先一条指令这是一条 4 字节指令)。
或者汇编器可能不知道乐趣在哪里或有什么乐趣,并将其留给链接器,在这种情况下,链接器可能会放置函数的绝对地址,类似于调用#0xD00D。
加载和存储也是如此,有些指令集具有近端和远端 PC 相对地址,有些具有绝对地址等。您可能不关心选择,您可能只是说
汇编器或链接器或两者的组合负责处理其余的部分。
请注意,对于某些指令集,汇编器和链接器可能会在一个程序中同时发生。如今,我们已经习惯了编译为对象然后链接对象的模型,但并非所有汇编器都遵循该模型。
在更多情况下,汇编语言可以采用一些快捷方式:
hang: bhang 有意义,分支到名为hang 的标签。本质上是自我的一个分支。顾名思义,这是一个无限循环。但对于这个汇编语言 b 。意味着分支到自身,一个无限循环,但我不必发明一个标签,输入它并分支到它。另一个快捷方式是使用数字 b 1b 表示分支到 1 返回,汇编器在指令后面或上面查找标签号 1。 b 1f 不是 self 的分支,意味着分支 1 向前,这对于该汇编器来说是完全有效的代码。它将向前或下方查找标签编号 1 的代码行:并且您可以在该汇编器的汇编语言程序中疯狂地重复使用编号 1,从而不必为简单的短分支发明标签名称。第二个 b 1b 分支到第二个 1. 并且是到 self 的分支。
重要的是要了解,创建处理器的公司定义了指令集、机器代码或操作码或他们或您用于处理器解码和执行的位和字节的任何术语。通常该公司会为这些指令生成一个带有汇编语言的文档,即语法。通常该公司会生成一个汇编程序来编译/汇编该汇编语言......使用该语法。但这并不意味着地球上选择为该指令集编写汇编程序的任何其他人都必须使用该语法。这对于 x86 指令集来说非常明显。同样,任何伪指令(如上面的 pop 或宏语法或其他快捷方式(如 b 1b))都必须从一个汇编器转换到另一个汇编器。通常情况下,您会在 ARM 中看到这一点,例如通用注释符号 ;不适用于 gnu 汇编器,您必须使用 @ 代替。 ARM 汇编器确实使用 ; (请注意,我用 ;@ 编写我的 Arm 汇编器以使其可移植)。使用 gnu 工具时情况会变得更糟,例如您可以将 #define 和 /* comment */ 之类的 C 语言内容放入汇编器中,并使用 C 编译器而不是汇编器,它就会工作。我更喜欢尽可能保持纯粹,以获得最大的可移植性,但您自然可以选择使用该工具提供的任何功能。
I think everyone is giving you the same answer. Instruction set is is the set (as in math) of all instructions the processor can execute or understand. Assembly language is a programming language.
Let me try some examples based on some of the questions you are asking. And I am going to be jumping around from processor to processor with whatever code I have handy.
Instruction or opcode or binary or machine language, whatever term you want to use for the bits/bytes that are loaded into the processor to be decoded and executed. An example
The assembly language, would be
For this particular processor. In this case that means r11 = r11 + r12. So I put that text, the add r12,r11 in a text file and use an assembler (a program that compiles/assembles assembly language) to assemble it into some form of binary. Like any programming language sometimes you create object files then link them together, sometimes you can go straight to a binary. And there are many forms of binaries which are in ascii and binary forms and a whole other discussion.
Now what can you do in assembler that is not part of the instruction set? How do they differ? Well for starters you can have macros:
Macros are like inline functions, they are not functions that are called but generate code in line. No different than a C macro for example. So you might use them to save some typing or you might use them to abstract something that you want to do over and over again and want the ability to change in one place and not have to touch every instance. The above example essentially generates this:
Another difference between the instruction set and assembly langage are pseudo instructions, for this particular instruction set for example there is no pop instruction for popping things off the stack at least not by that name, and I will explain why. But you are allowed to save some typing and use a pop in your code:
The reason why there is no pop is because the addressing modes are flexible enough to have a read from the address in the source register put the value in the destination register and increment the source register by a word. Which in assembler for this instruction set is
both the pop and the mov result in the opcode 0x413C.
Another example of differences between the instruction set and assembler, switching instruction sets, is something like this:
Which to this assembly language means load the address of bob into register 0, there is no instruction for that, what the assembler does with it is generate something that would look like this if you were to write it in assembler by hand:
Essentially, in a reachable place from that instruction, not in the execution path, a word is created which the linker will fill in with the address for bob. The ldr instruction likewise by the assembler or linker will get encoded with an ldr of a pc relative instruction.
That leads to a whole category of differences between the instruction set and the assembly language
Machine code has no way of knowing what fun is or where to find it. For this instruction set with its many addressing modes (note I am specifically and intentionally avoiding naming the instruction sets I am using as that is not relevant to the discussion) the assembler or linker as the case may be (depending on where the fun function ends up being relative to this instruction).
The assembler may choose to encode that instruction as pc relative, if the fun function is 40 bytes ahead of the call instruction it may encode it with the equivalent of call pc+36 (take four off because the pc is one instruction ahead at execution time and this is a 4 byte instruction).
Or the assembler may not know where or what fun is and leave it up to the linker, and in that case the linker may put the absolute address of the function something that would be similar to call #0xD00D.
Same goes for loads and stores, some instruction sets have near and far pc relative, some have absolute address, etc. And you may not care to choose, you may just say
and the assembler or linker or a combination of the two takes care of the rest.
Note that for some instruction sets the assembler and linker may happen at once in one program. These days we are used to the model of compiling to objects and then linking objects, but not all assemblers follow that model.
Some more cases where the assembly language can take some shortcuts:
The hang: b hang makes sense, branch to the label called hang. Essentially a branch to self. And as the name implies this is an infinite loop. But for this assembly language b . means branch to self, an infinite loop but I didnt have to invent a label, type it and branch to it. Another shortcut is using numbers b 1b means branch to 1 back, the assembler looks for the label number 1 behind or above the instruction. The b 1f, which is not a branch to self, means branch 1 forward, this is perfectly valid code for this assembler. It will look forward or below the line of code for a label number 1: And you can re-use number 1 like crazy in your assembly language program for this assembler, saves on having to invent label names for simple short branches. The second b 1b branches to the second 1. and is a branch to self.
It is important to understand that the company that created the processor defines the instruction set, and the machine code or opcodes or whatever term they or you use for the bits and bytes the processor decodes and executes. Very often that company will produce a document with assembly language for those instructions, a syntax. Often that company will produce an assembler program to compile/assemble that assembly language...using that syntax. But that doesnt mean that any other person on the planet that chooses to write an assembler for that instruction set has to use that syntax. This is very evident with the x86 instruction set. Likewise any psuedo instructions like the pop above or macro syntax or other short cuts like the b 1b have to be honored from one assembler to another. And very often are not, you see this with ARM for example the universal comment symbol of ; does not work with gnu assembler you have to use @ instead. ARMs assembler does use the ; (note I write my arm assembler with ;@ to make it portable). It gets even worse with gnu tools for example you can can put C language things like #define and /* comment */ in your assembler and use the C compiler instead of the assembler and it will work. I prefer to stay as pure as I can for maximum portability, but naturally you may choose to use whatever features the tool offers.
指令 set 由处理器可以执行的所有指令组成,而汇编是使用这些指令来编写程序的编程语言。
换句话说,指令集只是 CPU 可以理解的一组字节,但你不能用它们做任何有用的事情(将指令视为字母表中的字母),而汇编是一种允许你组合这些指令的语言(或字母)来制作一个程序(类似于演讲)。
The instruction set is composed by all the instructions a processor can execute, while assembly is the programming language that uses these instructions to make programs.
In other words, the instruction set is just a group of bytes a CPU can understand, but you can't do anything useful with them (think the instructions as the letters of the alphabet) while assembly is a language which lets you combine these instructions (or letters) to make a program (something like a speech).
汇编语言将包含指令的助记符,但通常会添加更多内容,例如:
编辑:指令(本身)将被编码为二进制CPU 来读取它。助记符是指令的名称。例如,在汇编语言中我可能会写“mov ax, 1”。相应的指令(在 x86 的情况下)将被编码为 B8 00000001(十六进制)。
定义数据、宏、函数名称等并不是实际的指令。宏(很像 C 等中的宏)允许您在汇编过程中定义名称。它可能(通常会)导致生成一些指令,但这些指令与宏定义本身是分开的。就像在 C 中一样,当您定义一些数据时,通常会在目标文件中生成一条记录,为名称 X 指定一定量的空间,但不会直接生成任何指令。
An assembly language will include mnemonics for the instructions but normally adds quite a bit more, such as:
Edit: An instruction (per se) will be encoded in binary for the CPU to read it. The mnemonic is a name for the instruction. For example, in assembly language I might write "mov ax, 1". The corresponding instruction for that would (in the case of an x86) be encoded as B8 00000001 (in hexadecimal).
Defining data, macros, names for functions, etc., are not actual instructions. A macro (much like a macro in C, etc.) allows you to define names during the assembly process. It might (often will) result in generating some instructions, but those are separate from the macro definition itself. Much like in C, when you define some data that will typically result in a record in the object file specifying some amount of space for name X, but doesn't directly generate any instructions.
计算机(更准确地说是处理器)只能进行计算,即执行算术和逻辑运算。
单个算术或逻辑运算称为指令。
所有指令的集合称为该计算机(更准确地说是处理器)的指令集。
指令集要么是硬连线在处理器中,要么是使用称为微码的技术来实现。
计算机只有拥有一种语言,即它能够理解的语言,才能进行编程。二进制代码不是计算机的语言。基于二进制代码的指令集是计算机的语言。
语言只不过是纸上的规范。第一种在纸上设计的语言是机器语言。它在计算机中的实现只能通过硬件(或最新技术微代码)来实现。该实现称为指令集。所有其他语言都将在机器语言之上设计。
机器语言很难使用,因为我们在日常生活中主要使用字母表。因此,决定在机器语言之上引入一种助记符语言,称为汇编语言。汇编语言的实现被命名为Assembler。
[您可能想知道第一个汇编程序是如何编写的。第一个汇编器可能是也可能不是用机器语言编写的。为了简单起见,我在这里没有提及引导的概念]
摘要:
A computer (more precisely processor) can only do computation i.e. perform arithmetic and logical operations.
A single arithmetic or logical operation is called an instruction.
The collection of all instructions is called instruction set of that computer (more precisely processor).
The instruction set is either hard-wired in processor or is implemented using a technique called microcode.
The computer could only be programmed, if it had a language i.e. something it understands. Binary code is not the language of computer. Binary code based instruction set is the language of computer.
A language is nothing but a specification on paper. The first ever language designed on paper was machine language. Its implementation in computer was only possible through hardware (or the latest technique microcode). That implementation is called instruction set. All other languages would be designed on top of machine language.
Machine language was difficult to work with as we mostly work with alphabets in our daily life. Therefore, it was decided to introduce a mnemonic language called Assembly Language on top of machine language. The implementation of Assembly language was named Assembler.
[You may wonder how the first assembler was written. The first assembler may or may not be written in machine language. I'm not mentioning the concept of bootstrapping here for the sake of simplicity]
SUMMARY: