过去 3 - 5 年我一直从事 C 和 CPython 工作。考虑一下我的知识基础。
如果我要对支持它的处理器使用诸如 MOV AL, 61h
之类的汇编指令,那么处理器内部到底是什么来解释该代码并将其作为电压信号进行调度?如此简单的指令如何执行?
当我尝试思考 MOV AL, 61h
甚至 XOR EAX, EBX
中包含的众多步骤时,汇编甚至感觉像是一种高级语言。
编辑:我读到一些评论,询问为什么我将其作为嵌入式,而 x86 系列在嵌入式系统中并不常见。欢迎承认我自己的无知。现在我想,如果我对此一无所知,那么很可能其他人也对此一无所知。
考虑到你们为答案付出的努力,我很难选择一个最喜欢的答案,但我觉得有必要做出决定。没有受伤的感觉,伙计们。
我经常发现,我对计算机了解得越多,我意识到自己真正了解的就越少。感谢您让我对微代码和晶体管逻辑敞开心扉!
编辑#2:感谢这个帖子,我刚刚理解了为什么 XOR EAX, EAX 比 MOV EAX, 0h 更快。 :)
I've been working in C and CPython for the past 3 - 5 years. Consider that my base of knowledge here.
If I were to use an assembly instruction such as MOV AL, 61h
to a processor that supported it, what exactly is inside the processor that interprets this code and dispatches it as voltage signals? How would such a simple instruction likely be carried out?
Assembly even feels like a high level language when I try to think of the multitude of steps contained in MOV AL, 61h
or even XOR EAX, EBX
.
EDIT: I read a few comments asking why I put this as embedded when the x86-family is not common in embedded systems. Welcome to my own ignorance. Now I figure that if I'm ignorant about this, there are likely others ignorant of it as well.
It was difficult for me to pick a favorite answer considering the effort you all put into your answers, but I felt compelled to make a decision. No hurt feelings, fellas.
I often find that the more I learn about computers the less I realize I actually know. Thank you for opening my mind to microcode and transistor logic!
EDIT #2: Thanks to this thread, I have just comprehended why XOR EAX, EAX
is faster than MOV EAX, 0h
. :)
发布评论
评论(12)
我最近开始阅读 Charles Petzold 的书《Code》,到目前为止,这本书完全涵盖了我认为您感兴趣的内容。但我还没有完全读完,所以在购买/借阅之前先浏览一下这本书。
这是我相对简短的答案,不是 Petzolds……希望符合您的好奇心。
我想你听说过晶体管。使用晶体管的最初方法是用于晶体管收音机之类的东西。它基本上是一个放大器,将漂浮在空气中的微小无线电信号馈送到晶体管的输入端,晶体管打开或关闭旁边电路上的电流。你用更高的功率连接该电路,这样你就可以获取一个非常小的信号,放大它并将其馈送到扬声器中,例如收听广播电台(还有更多隔离频率和保持晶体管平衡的功能,但是我希望你能明白)。
现在晶体管的存在导致了一种使用晶体管作为开关的方法,就像电灯开关一样。收音机就像一个调光开关,您可以将其切换到任何位置,从一直开到一直关。非调光灯开关要么全部打开,要么全部关闭,开关中间有一个神奇的地方可以切换。我们在数字电子产品中以同样的方式使用晶体管。获取一个晶体管的输出并将其馈送到另一个晶体管的输入。第一个晶体管的输出当然不是像无线电波那样的小信号,它迫使第二个晶体管一直打开或一直关闭。这就引出了 TTL 或晶体管-晶体管逻辑的概念。基本上你有一个晶体管驱动高电压或者我们称之为 1,然后它吸收零电压,我们称之为 0。然后你将输入与其他电子设备一起排列,这样你就可以创建与门(如果两个输入是 1,则输出是 1),或门(如果一个或另一个输入是 1,则输出是 1)。反相器、NAND、门、或非门(一个或一个反相器)等。曾经有一本 TTL 手册,你可以购买 8 个左右引脚的芯片,其中有一个或两个或四个某种门(NAND、NOR、 AND 等)内部函数,每个有两个输入和一个输出。现在我们不需要这些,创建具有数百万个晶体管的可编程逻辑或专用芯片会更便宜。但我们仍然从硬件设计的“与”、“或”和“非”门的角度来思考。 (通常更像nand 和nor)。
我不知道他们现在教什么,但概念是相同的,对于内存来说,触发器可以被认为是两个这样的 TTL 对 (NAND),其中一个的输出连接到另一个的输入。让我们就这样吧。这基本上是我们所说的 SRAM 或静态 RAM 中的一个位。 sram 基本上需要 4 个晶体管。 DRAM 或动态 RAM(您自己放入计算机中的记忆棒,每一位都占用一个晶体管),因此对于初学者来说,您可以明白为什么 DRAM 是您购买的价值千兆字节的东西。只要电源不中断,Sram 位就会记住您的设置。一旦你告诉它,Dram 就开始忘记你告诉它的内容,基本上 dram 以第三种不同的方式使用晶体管,有一些电容(如电容器,这里不会讨论)就像一个微型可充电电池,一旦你给它充电并拔掉充电器,它就开始耗尽。想象一下架子上有一排玻璃杯,每个玻璃杯上都有小孔,这些是你的碎片,你想要其中一些是那些,所以你有一个助手来填充你想要成为的玻璃杯。那个助手必须不断地把水罐装满,然后沿着一排走下去,让“一”位的玻璃杯装满水,让“零”位的玻璃杯保持空状态。这样,无论何时,您想查看数据是什么,您都可以通过查找绝对高于中间值的水位为 1,以及绝对低于中间值的水位为 0,来查看并读取 1 和 0。打开电源后,如果助手无法将眼镜保持足够满以区分一和零,它们最终都会看起来像零并耗尽。这是每个芯片更多位数的权衡。这里的简短故事是,在处理器之外,我们使用 DRAM 作为大容量内存,并且有辅助逻辑负责保持 1 和 0 为 0。但在芯片内部,AX 寄存器和 DS 寄存器等使用触发器或 sram 保存数据。对于您所了解的每一位(例如 AX 寄存器中的位),可能有数百、数千或更多位用于将这些位移入或移出该 AX 寄存器。
您知道处理器以一定的时钟速度运行,目前约为 2 GHz 或每秒 20 亿个时钟。想想时钟,它是由晶体产生的,另一个话题,但逻辑将时钟视为一个电压,在这个时钟速率 2ghz 或任何其他频率下变高、变高、变为零(gameboy advance 为 17mhz,旧 ipod 约为 75mhz,原装 IBM 电脑 4.77mhz)。
因此,用作开关的晶体管使我们能够获取电压并将其转换为我们作为硬件工程师和软件工程师所熟悉的 1 和 0,甚至为我们提供 AND、OR 和 NOT 逻辑功能。我们拥有这些魔法晶体,可以让我们获得准确的电压振荡。
所以我们现在可以做这样的事情,如果时钟是 1,并且我的状态变量表示我处于获取指令状态,那么我需要切换一些门,以便我想要的指令的地址位于程序计数器,在内存总线上输出,以便内存逻辑可以给我 MOV AL,61h 的指令。你可以在 x86 手册中查找这一点,并发现其中一些操作码位表示这是一个 mov 操作,目标是 EAX 寄存器的低 8 位,而 mov 的源是一个立即数,这意味着它位于该指令之后的内存位置。因此,我们需要将该指令/操作码保存在某处,并在下一个时钟周期获取下一个内存位置。现在我们已经保存了mov al,立即数,我们从内存中读取了值61h,我们可以切换一些晶体管逻辑,以便该61h的位0存储在al的位0触发器中,位1存储在位1中,等等 你问这
一切是怎么发生的?考虑一个执行某些数学公式的 Python 函数。您从程序的顶部开始,以变量的形式输入公式的一些输入,您可以通过程序执行单独的步骤,这些步骤可能会在此处添加常量或从库中调用平方根函数等。在底部,您可以返回答案。硬件逻辑也是以同样的方式完成的,如今使用的编程语言之一看起来很像 C。主要区别是您的硬件函数可能有数百或数千个输入,而输出是单个位。在每个时钟周期,AL 寄存器的位 0 都会使用一个庞大的算法进行计算,具体取决于您想要查看的范围。考虑一下您为数学运算调用的平方根函数,该函数本身是这些输入产生输出的函数之一,并且它可能会调用其他函数(可能是乘法或除法)。因此,您可能在某个地方有一个位,您可以将其视为 AL 寄存器的位 0 之前的最后一步,其功能是:如果时钟为 1,则 AL[0] = AL_next[0];否则 AL[0] = AL[0];但是有一个更高的函数包含从其他输入计算出的下一个位,还有一个更高的函数和一个更高的函数,其中大部分是由编译器创建的,就像你的三行Python可以变成数百或数千行一样汇编程序行。几行 HDL 可以变成数百、数千或更多的晶体管。硬件人员通常不会查看特定位的最低级公式来找出所有可能的输入以及所有可能的 AND、OR 和 NOT,而这些计算所需的时间可能比检查程序生成的汇编器要多。但如果你愿意的话你可以。
关于微编码的注释,大多数处理器不使用微编码。例如,你会喜欢 x86,因为它在当时是一套很好的指令集,但表面上很难跟上现代的步伐。其他指令集不需要微编码,直接按照我上面描述的方式使用逻辑。您可以将微编码视为使用不同指令集/汇编语言的不同处理器,该处理器模拟您在表面上看到的指令集。不像你尝试在 mac 上模拟 windows 或在 windows 上模拟 linux 等那么复杂。微编码层是专门为这项工作而设计的,你可能会认为只有 AX、BX、CX、DX 这四个寄存器,但是有里面还有很多。当然,一个汇编程序可以以某种方式在一个核心或多个核心的多个执行路径上执行。就像闹钟或洗衣机中的处理器一样,微代码程序简单而小,经过调试并烧录到硬件中,希望永远不需要固件更新。至少在理想情况下是这样。但就像你的 iPod 或手机一样,你有时确实需要错误修复或其他什么,并且有一种方法可以升级你的处理器(BIOS 或其他软件在启动时加载补丁)。假设您打开电视遥控器或计算器的电池盒,您可能会看到一个孔,您可以在其中看到一些连续的裸露金属触点,可能是三个、五个或多个。对于某些遥控器和计算器,如果您确实愿意,可以重新编程,更新固件。但通常情况下不会,理想情况下遥控器是完美的或完美到足以比电视机更耐用。微编码提供了将非常复杂的产品(数百万、数亿个晶体管)推向市场并修复该领域中重大且可修复的错误的能力。想象一下,您的团队在 18 个月内编写了一个 2 亿行的 Python 程序,并且必须交付它,否则公司将无法提供竞争产品。同样的事情,除了你可以在现场更新的一小部分代码之外,其余的都必须保留在石头上。对于闹钟或烤面包机,如果有错误或需要帮助,您只需将其扔掉并重新购买即可。
如果您深入维基百科或谷歌搜索,您可以查看 6502、z80、8080 和其他处理器等的指令集和机器语言。可能有 8 个寄存器和 250 条指令,您可以从晶体管的数量中感受到,与每个时钟计算触发器中每个位所需的逻辑门序列相比,这 250 条汇编指令仍然是一种非常高级的语言循环。你的这个假设是正确的。除了微编码处理器之外,这种低级逻辑不能以任何方式重新编程,您必须使用软件修复硬件错误(对于已交付或将要交付且未报废的硬件)。
查一下佩措尔德的书,他在解释方面做得非常出色,远远优于我写的任何东西。
I recently started reading Charles Petzold book titled Code, which so far covers exactly the kinds of things I assume you are curious about. But I have not gotten all the way through so thumb through the book first before buying/borrowing.
This is my relatively short answer, not Petzolds...and hopefully in line with what you were curios about.
You have heard of the transistor I assume. The original way to use a transistor was for things like a transistor radio. it is an amplifier basically, take the tiny little radio signal floating in air and feed it into the input of the transistor which opens or closes the flow of current on a circuit next to it. And you wire that circuit with higher power, so you can take a very small signal, amplify it and feed it into a speaker for example and listen to the radio station (there is more to it isolating the frequency and keeping the transistor balanced, but you get the idea I hope).
Now that the transistor exists that lead to was a way to use a transistor as a switch, like a light switch. The radio is like a dimmer light switch you can turn it to anywhere from all the way on to all the way off. A non-dimmer light switch is either all on or all off, there is some magic place in the middle of the switch where it changes over. We use transistors the same way in digital electronics. Take the output of one transistor and feed it into another transistors input. The output of the first is certainly not a small signal like the radio wave, it forces the second transistor all the way on or all the way off. that leads to the concept of TTL or transistor-transistor logic. Basically you have one transistor that drives a high voltage or lets call it a 1, and on that sinks a zero voltage, lets call that a 0. And you arrange the inputs with other electronics so that you can create AND gates (if both inputs are a 1 then the output is a 1), OR gates (if either one or the other input is a 1 then the output is a one). Inverters, NAND, gates, NOR gates (an or with an inverter) etc. There used to be a TTL handbook and you could buy 8 or so pin chips that had one or two or four of some kind of gate (NAND, NOR, AND, etc) functions inside, two inputs and an output for each. Now we dont need those it is cheaper to create programmable logic or dedicated chips with many millions of transistors. But we still think in terms of AND, OR, and NOT gates for hardware design. (usually more like nand and nor).
I dont know what they teach now but the concept is the same, for memory a flip flop can be thought of as two of these TTL pairs (NANDS) tied together with the output of one going to the input of the other. Lets leave it at that. That is basically a single bit in what we call SRAM, or static ram. sram takes basically 4 transistors. Dram or dynamic ram the memory sticks you put in your computer yourself take one transistor per bit, so for starters you can see why dram is the thing you buy gigabytes worth of. Sram bits remember what you set them to so long as the power doesnt go out. Dram starts to forget what you told it as soon as you tell it, basically dram uses the transistor in yet a third different way, there is some capacitance (as in capacitor, wont get into that here) that is like a tiny rechargeable battery, as soon as you charge it and unplug the charger it starts to drain. Think of a row of glasses on a shelf with little holes in each glass, these are your dram bits, you want some of them to be ones so you have an assistant fill up the glasses you want to be a one. That assistant has to constantly fill up the pitcher and go down the row and keep the "one" bit glasses full enough with water, and let the "zero" bit glasses remain empty. So that at any time you want to see what your data is you can look over and read the ones and zeros by looking for water levels that are definitely above the middle being a one and levels definitely below the middle being a zero.. So even with the power on, if the assistant is not able to keep the glasses full enough to tell a one from a zero they will eventually all look like zeros and drain out. Its the trade off for more bits per chip. So short story here is that outside the processor we use dram for our bulk memory, and there is assistant logic that takes care of keeping the ones a one and zeros a zero. But inside the chip, the AX register and DS registers for example keep your data using flip flops or sram. And for every bit you know about like the bits in the AX register, there are likely hundreds or thousands or more that are used to get the bits into and out of that AX register.
You know that processors run at some clock speed, these days around 2 gigahertz or two billion clocks per second. Think of the clock, which is generated by a crystal, another topic, but the logic sees that clock as a voltage that goes high and zero high and zero at this clock rate 2ghz or whatever (gameboy advances are 17mhz, old ipods around 75mhz, original ibm pc 4.77mhz).
So transistors used as switches allow us to take voltage and turn it into the ones and zeros we are familiar with both as hardware engineers and software engineers, and go so far as to give us AND, OR, and NOT logic functions. And we have these magic crystals that allow us to get an accurate oscillation of voltage.
So we can now do things like say, if the clock is a one, and my state variable says I am in the fetch instruction state, then I need to switch some gates so that the address of the instruction I want, which is in the program counter, goes out on the memory bus, so that the memory logic can give me my instruction for MOV AL,61h. You can look this up in a x86 manual, and find that some of those opcode bits say this is a mov operation and the target is the lower 8 bits of the EAX register, and the source of the mov is an immediate value which means it is in the memory location after this instruction. So we need to save that instruction/opcode somewhere and fetch the next memory location on the next clock cycle. so now we have saved the mov al, immediate and we have the value 61h read from memory and we can switch some transistor logic so that bit 0 of that 61h is stored in the bit 0 flipflop of al and bit 1 to bit 1, etc.
How does all that happen you ask? Think about a python function performing some math formula. you start at the top of the program with some inputs to the formula that come in as variables, you have individual steps through the program that might add a constant here or call the square root function from a library, etc. And at the bottom you return the answer. Hardware logic is done the same way, and today programming languages are used one of which looks a lot like C. The main difference is your hardware functions might have hundreds or thousands of inputs and the output is a single bit. On every clock cycle, bit 0 of the AL register is being computed with a huge algorithm depending how far out you want to look. Think about that square root function you called for your math operation, that function itself is one of these some inputs produce an output, and it may call other functions maybe a multiply or divide. So you likely have a bit somewhere that you can think of as the last step before bit 0 of the AL register and its function is: if clock is one then AL[0] = AL_next[0]; else AL[0] = AL[0]; But there is a higher function that contains that next al bit computed from other inputs, and a higher function and a higher function and much of these are created by the compiler in the same way that your three lines of python can turn into hundreds or thousands of lines of assembler. A few lines of HDL can become hundreds or thousands or more transistors. hardware folks dont normally look at the lowest level formula for a particular bit to find out all the possible inputs and all the possible ANDs and ORs and NOTs that it takes to compute any more than you probably inspect the assembler generated by your programs. but you could if you wanted to.
A note on microcoding, most processors do not use microcoding. you get into it with the x86 for example because it was a fine instruction set for its day but on the surface struggles to keep up with modern times. other instruction sets do not need microcoding and use logic directly in the way I described above. You can think of microcoding as a different processor using a different instruction set/assembly language that is emulating the instruction set that you see on the surface. Not as complicated as when you try to emulate windows on a mac or linux on windows, etc. The microcoding layer is designed specifically for the job, you may think of there only being the four registers AX, BX, CX, DX, but there are many more inside. And naturally that one assembly program somehow can get executed on multiple execution paths in one core or multiple cores. Just like the processor in your alarm clock or washing machine, the microcode program is simple and small and debugged and burned into the hardware hopefully never needing a firmware update. At least ideally. but like your ipod or phone for example you sometimes do want a bug fix or whatever and there is a way to upgrade your processor (the bios or other software loads a patch on boot). Say you open the battery compartment to your TV remote control or calculator, you might see a hole where you can see some bare metal contacts in a row, maybe three or 5 or many. For some remotes and calculators if you really wanted to you could reprogram it, update the firmware. Normally not though, ideally that remote is perfect or perfect enough to outlive the TV set. Microcoding provides the ability to get the very complicated product (millions, hundreds of millions of transistors) on the market and fix the big and fixable bugs in the field down the road. Imagine a 200 million line python program your team wrote in say 18 months and having to deliver it or the company will fail to the competitions product. Same kind of thing except only a small portion of that code you can update in the field the rest has to remain carved in stone. for the alarm clock or toaster, if there is a bug or the thing needs help you just throw it out and get another.
If you dig through wikipedia or just google stuff you can look at the instruction sets and machine language for things like the 6502, z80, 8080, and other processors. There may be 8 registers and 250 instructions and you can get a feel from the number of transistors that that 250 assembly instructions is still a very high level language compared to the sequence of logic gates it takes to compute each bit in a flip flop per clock cycle. You are correct in that assumption. Except for the microcoded processors, this low level logic is not re-programmable in any way, you have to fix the hardware bugs with software (for hardware that is or going to be delivered and not scrapped).
Look up that Petzold book, he does an excellent job of explaining stuff, far superior to anything I could ever write.
编辑:这是一个 CPU (6502) 的示例,已在晶体管级别使用 python/javascript 进行模拟http://visual6502 .org 您可以将代码放入其中以查看它如何执行其操作。
编辑:出色的 10 000m 关卡视图:新机器的灵魂 - Tracy Kidder
我很难想象这一点。那么这一切就都有意义了(抽象地)。这是一个复杂的主题,但从非常高的层次来看。
基本上是这样想的。
CPU 指令本质上是一组存储在构成内存的电路中的电荷。有一些电路可以使这些电荷从内存转移到 CPU 内部。一旦进入 CPU 内部,电荷就会被设置为 CPU 电路接线的输入。这本质上是一个数学函数,将导致更多的电力输出发生,并且循环继续。
现代CPU要复杂得多,但包含多层微编码,但原理保持不变。记忆是一组电荷。存在移动电荷的电路和执行功能的其他电路将导致其他电荷(输出)馈送到存储器或其他电路以执行其他功能。
要了解存储器的工作原理,您需要了解逻辑门以及它们是如何从多个晶体管创建的。这导致我们发现硬件和软件在本质上执行数学意义上的功能方面是等效的。
Edit: Here is a example of CPU (6502) that has been simulated using python/javascript AT THE TRANSISTOR LEVEL http://visual6502.org You can put your code in to see how it to do what it does.
Edit: Excellent 10 000m Level View : Soul of a New Machine - Tracy Kidder
I had great difficulty envisioning this until I did microcoding. Then it all made sense (abstractly). This is a complex topic but in a very very high level view.
Essentially think of it like this.
A cpu instruction is essentially a set of charges stored in electrical circuits that make up memory. There is circuity that cause those charges to be transferred to the inside of the CPU from the memory. Once inside the CPU the charges are set as input to the wiring of the CPU's circuitry. This is essentially a mathematical function that will cause more electrical output to occur, and the cycle continues.
Modern cpus are far far more complex but and include many layers of microcoding, but the principle remains the same. Memory is a set of charges. There is circuitry to move the charges and other circuitry to carry out function with will result in other charges (output) to fed to memory or other circuitry to carry out other functions.
To understand how the memory works you need to understand logic gates and how they are created from multiple transistors. This leads to the discovery that hardware and software are equivalent in in the sense that the essentially perform functions in the mathematical sense.
这个问题需要 StackOverflow 上的答案来解释。
要了解从最基本的电子元件到基本机器代码的全部内容,请阅读 电子艺术,作者:Horowitz 和 Hill。要了解有关计算机体系结构的更多信息,请阅读 Patterson 和 Hennessey 的《计算机组织和设计》。如果您想了解更高级的主题,请阅读 Hennessey 和帕特森。
顺便说一下,《电子艺术》还有一本配套的实验室手册。如果您有时间和资源,我强烈建议您进行实验;事实上,我参加了 Tom Hayes 教授的课程,在课程中我们构建了各种模拟和数字电路,最终用 68k 芯片、一些 RAM、一些 PLD 和一些分立元件构建了一台计算机。您可以使用十六进制键盘将机器代码直接输入到 RAM 中;这是一次爆炸,也是获得计算机最低级别实践经验的好方法。
This is a question that requires more than an answer on StackOverflow to explain.
To learn about this all the way from the most basic electronic components up to basic machine code, read The Art of Electronics, by Horowitz and Hill. To learn more about computer architecture, read Computer Organization and Design by Patterson and Hennessey. If you want to get into more advanced topics, read Computer Architecture: A Quantitative Approach, by Hennessey and Patterson.
By the way, The Art of Electronics also has a companion lab manual. If you have the time and resources available, I would highly recommend doing the labs; I actually took the classes taught by Tom Hayes, in which we built a variety of analog and digital circuits, culminating in building a computer from a 68k chip, some RAM, some PLDs, and some discrete components. You would enter machine code directly into RAM using a hexadecimal keypad; it was a blast, and a great way to get hands on experience at the very lowest levels of a computer.
如果没有整本书,就不可能详细解释整个系统,但这里是对简单计算机的非常高层次的概述:
要了解汇编指令如何导致电压变化,您只需要了解每个级别如何由下面的级别表示即可。例如,ADD 指令将导致两个寄存器的值传播到 ALU,其中 ALU 具有计算所有逻辑运算的电路。然后另一侧的多路复用器接收来自指令的 ADD 信号,选择所需的结果,并将其传播回其中一个寄存器。
Explaining the whole system in any detail is impossible to do without entire books, but here is a very high level overview of a simplistic computer:
To understand how an assembly instruction causes a voltage change, you simply need to understand how each of those levels is represented by the level below. For example, an ADD instruction will cause the value of two registers to propagate to the ALU, which has circuits that compute all of the logic operations. Then a multiplexer on the other side, being fed the ADD signal from the instruction, selects the desired result, which propagates back to one of the registers.
这是一个大问题,大多数大学都会用整个学期的课程来回答这个问题。因此,我不会在这个小盒子里给你一些可怕的总结,而是引导你阅读包含全部真相的教科书:计算机组织和设计:Patterson 和 Hennessey 的硬件/软件接口。
This is a big question, and at most universities there's an entire semester-long class to answer it. So, rather than give you some terribly butchered summary in this little box, instead I'll direct you to the textbook that has the whole truth: Computer Organization and Design: The Hardware/Software Interface by Patterson and Hennessey.
更简单的介绍,但仍然很好地从头开始介绍计算机
查尔斯的 Petzold 代码
A simpler introduction but still very good intro to a computer from the wire up
Charles' Petzold's code
简而言之,
机器代码指令作为一系列位存储在处理器内。如果您在处理器数据表中查找
MOV
,您会发现它有一个十六进制值,例如(例如)0xA5,这是特定于MOV
指令的..(有不同类型的 MOV 指令具有不同的值,但我们暂时忽略它)。0xA5 十六进制 == 10100101 二进制。
*(这不是 X86 上
MOV
的真正操作码值 - 我只是出于说明目的选择一个值)。在处理器内部,它存储在一个“寄存器”中,它实际上是一个触发器或锁存器的数组,它存储一个电压:
+5
0
+5
0
0
+5
0
+5
每个这些电压的一部分馈入一个门或多个门的输入。
在下一个时钟沿,这些门根据寄存器的输入电压更新其输出。
这些门的输出会馈送到另一层门,或者返回到它们自身。该级别会影响下一个级别,下一个级别又会影响下一个级别,依此类推。
最终,线路下方的门输出将连接回另一个锁存器/触发器(内部存储器)或处理器上的一个输出引脚。
(忽略不同门类型和更高级别结构的反馈)
这些操作在核心架构定义的某种程度上并行发生。 “更快”的处理器(例如 2.0GHz 与 1.0GHz)性能更好的原因之一是更快的时钟速度(GHz 值)导致从一个门集合到下一个门集合的传播速度更快。
重要的是要理解,在非常的高水平上,处理器所做的只是改变引脚电压。我们在使用 PC 等设备时看到的所有复杂性都源自门的内部模式以及连接到处理器的外部设备/外设(如其他 CPU、RAM 等)的模式。处理器的核心是其引脚改变电压的模式和顺序,以及允许 CPU 某一时刻的状态影响下一时刻状态的内部反馈。 (在汇编中,这种状态由标志、指令指针/计数器、寄存器值等表示)
以一种非常真实的方式,每个操作码(机器代码指令)的位在物理上与处理器的内部结构相关联(尽管这可以在必要时使用内部查找表/指令图在一定程度上进行抽象)。
希望有帮助。我还接受过良好的 EE 教育,并拥有大量嵌入式开发经验,因此这些抽象对我来说很有意义,但对新手来说可能不是很有用。
VERY briefly,
A machine code instruction is stored within the processor as a series of bits. If you look up
MOV
in the processor data sheet, you'll see that it has a hex value, like (for example) 0xA5, that is specific to theMOV
instruction.. (There are different types ofMOV
instructions with different values, but let's ignore that for the moment).0xA5 hex == 10100101 binary.
*(this is not a real opcode value for
MOV
on an X86 - I'm just picking a value for illustration purposes).Inside of the processor, this is stored in a "register", which is really an array of flip-flops or latches, which store a voltage:
+5
0
+5
0
0
+5
0
+5
Each of these voltages feeds into the input of a gate or collection of gates.
At the next clock edge, those gates update their output based in the input voltages from the register.
The output of those gates feeds into another level of gates, or back to themselves. That level feeds into the next, which feeds into the next, and so on.
Eventually, a gate output way down the line will be connected back to another latch/flip-flop (internal memory), or one of the output pins on the processor.
(ignoring feedback for different gate types and higher-level structures)
These operations happen in parallel to a certain degree as defined by the core architecture. One of the reasons that "faster" processors -say, 2.0GHz vs 1.0GHz - perform better is that a faster clock speed (the GHz value) results in faster propagation from one collection of gates to the next.
It's important to understand that, at a very high level, all a processor does is change pin voltages. All of the glorious complexity that we see when using a device such as a PC is derived from the internal pattern of gates and the patterns in the external devices/peripherals attached to the processor, like other CPUs, RAM, etc. The magic of a processor is the patterns and sequences in which its pins change voltages, and the internal feedback that allows the state of the CPU at one moment to contribute to its state at the next. (In assembly, this state is represented by flags, the instruction pointer/counter, register values, etc.)
In a very real way, the bits of each opcode(machine code instruction) are physically tied to the internal structure of the processor (though this may be abstracted to a certain degree with an internal lookup table/instruction map where necessary).
Hope that helps. I've also got a nice EE education under my belt and a whole lot of embedded development experience, so these abstractions make sense to me, but may not be very useful to a neophyte.
数字电路中最基本的元素应该是逻辑门。逻辑门可用于构建逻辑电路来执行布尔算术,或解码器,或顺序电路,例如人字拖。触发器可以被认为是 1 位存储器。它是更复杂的时序电路的基础,例如计数器或寄存器(位数组)。
微处理器只是一堆定序器和寄存器。微处理器只不过是按顺序推入某些寄存器的位模式,以触发特定序列对“数据”执行计算。数据被表示为位数组......现在我们处于更高的水平。
The most basic element in a digital circuit should be the Logic Gate. Logic gates can be used to build logic circuits to perform boolean arithmetic, or decoders, or sequential circuits such as Flip-Flops. The Flip-Flop can be thought of a 1 bit memory. It is the basis of more complex sequential circuits, such as counters, or registers (arrays of bits).
A microprocessor is just a bunch of sequencers and registers."Instructions" to a microprocessor are no more than just patterns of bits that get sequentially pushed onto some of the registers, to trigger specific sequences to perform calculations on "Data". Data is represented as arrays of bits... and now we're on a higher level.
好吧,这是一个非常糟糕的总结:-)
MOV AL, 61h 又是一种人类可读的代码形式,被输入到汇编器中。汇编器生成等效的十六进制代码,它基本上是处理器可以理解的字节序列,这就是在嵌入式系统环境中,链接器脚本为您提供细粒度的控制,以控制将这些字节(程序/数据等的单独区域)放置在内存中的位置。
该处理器本质上包含一个使用触发器实现的有限状态机(微代码)。机器从内存中读取(获取周期)“MOV”的十六进制代码,找出(解码周期)它需要一个操作数,在本例中为 61h,再次从内存中获取它并执行它(即复制 61到累加器寄存器中。“读取”“获取”、“执行”等都意味着使用加法器、减法器、多路复用器等数字电路将字节移入和移出移位寄存器
Well here's one terribly butchered summary :-)
A MOV AL, 61h is again a human readable form of code which is fed into the assembler.The assembler generates the equivalent hexcode which is basically a sequence of bytes understood by the processor and which is what you would store in the memory.In an embedded system environment, the linker scripts give you fine grained control as to where to place these bytes (separate areas for program/data etc) in memory.
The processor essentially contains a finite state machine (microcode) implemented using flip flops. The machine reads(fetch cycle) the hex code for 'MOV' from the memory, figures out(decode cycle) that it needs an operand,which in this case is 61h, again fetches it from memory, and executes it (i.e copies 61 into the accumulator register.'Read' 'fetch' , execute' etc all mean the bytes are shifted/added in and out of shift registers using digital circuits like adders,subtractors,multiplexers etc
“微处理器设计”一书的草稿目前已在 Wikibooks 上在线。
我希望有一天它能对这个问题给出一个很好的答案。
同时,也许您仍然可以从该问题当前的粗略答案草案中学到一些东西,并帮助我们做出改进,或者至少指出我们忘记解释的内容以及解释令人困惑的地方。
The rough draft of the book "Microprocessor Design" is currently online at Wikibooks.
I hope that someday it will include an excellent answer to that question.
Meanwhile, perhaps you can still learn something from the current rough draft of an answer to that question, and help us make improvements or at least point out stuff we forgot to explain and areas where the explanation is confusing.
我想说“硬件”,但更真实的答案是“微代码'。
I'd like to say 'hardware', but a truer answer is 'microcode'.
我一直在思考这个问题并疯狂地谷歌搜索。人们回答诸如“bla bla 写入 RAM”之类的问题,但我真的对“写入”的含义感兴趣。
您总是从键入代码开始,对吧?然后进行编译、汇编、机器代码等...它如何变成晶体管上的电压?但是等等!让我们退后一步。当您键入代码时,假设您想用任何语言编写“print 'Hello World'”。当您在键盘上按下“p”(“print”的第一个字母)时,您实际上正在重新路由由墙壁插座提供的电流通过特定路径到达一组特定晶体管。因此,您实际上已经在此步骤中存储了 0V 和 +5V。不是后来生成的!
这些电压在后续步骤中如何被冲洗是很好的……各个层面的电气科学。
希望这能回答您的问题。
I have been thinking about it and googling like crazy. People answer stuff like "the bla bla writes to RAM", but I'm really interested in what that "write" means.
You always start by typing code, right? Which then gets compiled, assembly, machine code etc etc... how does it turn into voltages on transistors? But wait! Let's step back a bit here. When you are typing code, say you want to write "print 'Hello World'" in whatever language. The second you press "p" (the first letter of "print") on your keyboard you are actually re-routing electrical current which is provided by wall-outlet across a certain path to a certain set of transistors. So you are actually already storing the 0V and +5V in this step. It is not generated later!
How these voltages in later steps are flushed around is well... electrical science on all levels.
Hope this answers your question.