在 C/C++ 中解码和匹配 Chip 8 操作码
我正在编写一个 Chip 8 仿真器作为仿真的介绍,但我有点迷失了。基本上,我读取了 Chip 8 ROM 并将其存储在内存中的字符数组中。然后,按照指南,我使用以下代码来检索当前程序计数器 (pc) 处的操作码:
// Fetch opcode
opcode = memory[pc] << 8 | memory[pc + 1];
Chip 8 操作码每个为 2 个字节。这是来自指南的代码,我模糊地理解为向内存[pc]添加8个额外的位空间(使用<<8),然后将内存[pc + 1]与其合并(使用|)并将结果存储在操作码变量。
现在我已经隔离了操作码,但我真的不知道如何处理它。我正在使用 this 操作码表,我基本上迷失了匹配我读取该表中的操作码标识符的十六进制操作码。另外,我意识到我正在读取的许多操作码也包含操作数(我假设是后一个字节?),这可能使我的情况进一步复杂化。
帮助?!
I'm writing a Chip 8 emulator as an introduction to emulation and I'm kind of lost. Basically, I've read a Chip 8 ROM and stored it in a char array in memory. Then, following a guide, I use the following code to retrieve the opcode at the current program counter (pc):
// Fetch opcode
opcode = memory[pc] << 8 | memory[pc + 1];
Chip 8 opcodes are 2 bytes each. This is code from a guide which I vaguely understand as adding 8 extra bit spaces to memory[pc] (using << 8) and then merging memory[pc + 1] with it (using |) and storing the result in the opcode variable.
Now that I have the opcode isolated however, I don't really know what to do with it. I'm using this opcode table and I'm basically lost in regards to matching the hex opcodes I read to the opcode identifiers in that table. Also, I realize that many of the opcodes I'm reading also contain operands (I'm assuming the latter byte?), and that is probably further complicating my situation.
Help?!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
基本上,一旦你有了指令,你就需要对其进行解码。例如,从您的操作码表中:
并猜测,由于您每条指令访问 ROM 两个字节,因此该地址可能是一个(16 位)字地址而不是字节地址,所以我将其左移一位(您需要研究这些指令是如何工作的)编码,您提供的操作码表不足以满足这一点,并且不必做出假设)。
还有很多事情要做,我不知道我是否在 github 示例中写过任何相关内容。我建议您创建一个用于在某个地址获取指令的获取函数、一个读存储器函数、一个写存储器函数、一个读寄存器函数、一个写寄存器函数。我建议您的解码和执行函数一次仅解码并执行一条指令。正常执行只是在循环中调用它,它提供了执行中断和类似操作的能力,而无需大量额外工作。它还可以模块化您的解决方案。通过创建 fetch() read_mem_byte() read_mem_word() 等函数。您可以模块化代码(以轻微的性能成本为代价),从而使调试变得更加容易,因为您可以在一个地方监视寄存器或内存访问并找出发生了什么或没有发生什么。
根据您的问题以及您在此过程中的位置,我认为在编写模拟器之前需要做的第一件事是编写反汇编程序。作为固定指令长度的指令集(16 位),这使得它变得更加容易。您可以从 ROM 中的某个有趣的点开始,或者如果您愿意,可以从头开始,并解码您看到的所有内容。例如:
只有 35 条指令,不应该花费一个下午,也许整个星期六,这是您第一次解码指令(我认为这是基于您的问题)。反汇编器成为模拟器的核心解码器。用模拟替换 printf(),最好保留 printf,只添加代码来模拟指令执行,这样您就可以跟踪执行情况。 (同样的交易有一个反汇编单指令函数,为每条指令调用它,这成为模拟器的基础)。
您对获取代码行正在做什么的理解需要非常模糊,为了完成这项任务,您必须对位操作有深入的了解。
另外,我会称您提供的那行代码有错误或至少有风险。如果 memory[] 是字节数组,则编译器很可能使用字节大小的数学运算执行左移,导致零,然后零或与第二个字节的结果仅是第二个字节。
基本上,编译器有权将以下内容转换为:
变成这样:
这对您来说根本不起作用,一个非常快速的解决方案:
将为您省去一些麻烦。最小优化将使编译器免于将每个操作的中间结果存储到 RAM 中,从而产生相同的(所需的)输出/性能。
我编写和上面提到的指令集模拟器并不是为了性能,而是为了可读性、可见性,并希望具有教育意义。我会从类似的事情开始,然后如果对性能等感兴趣,您将不得不重新编写它。这个chip8模拟器,一旦体验过,从头开始将是一个下午的任务,所以一旦你第一次完成这个任务,你可以在周末重写它可能三到四次,而不是一个巨大的任务(必须重写)。 (拇指控制器花了我一个周末的时间,其中大部分时间。msp430 可能更像是一两个晚上的工作。一劳永逸地正确设置溢出标志是最大的任务,这是后来的任务)。无论如何,重点是,看看诸如 mame 源之类的东西,大多数(如果不是全部)这些指令集模拟器都是为了执行速度而设计的,如果没有大量的研究,许多指令集模拟器几乎无法阅读。通常是大量的表驱动,有时有很多 C 编程技巧等。从可管理的东西开始,让它正常运行,然后担心改进它的速度、大小、可移植性或其他什么。这个chip8东西看起来是基于图形的,所以你还必须在位图/屏幕/任何地方处理大量的线条绘制和其他位操作。或者您可以只调用 api 或操作系统函数。基本上这个chip8东西不是带有寄存器和一系列寻址模式和alu操作的传统指令集。
Basically once you have the instruction you need to decode it. For example from your opcode table:
And guessing that since you are accessing rom two bytes per instruction, the address is probably a (16 bit) word address not a byte address so I shifted it left one (you need to study how those instructions are encoded, the opcode table you provided is inadequate for that, well without having to make assumptions).
There is a lot more that has to happen and I dont know if I wrote anything about it in my github samples. I recommend you create a fetch function for fetching instructions at an address, a read memory function, a write memory function a read register function, write register function. I recommend your decode and execute function decodes and executes only one instruction at a time. Normal execution is to just call it in a loop, it provides the ability to do interrupts and things like that without a lot of extra work. It also modularizes your solution. By creating the fetch() read_mem_byte() read_mem_word() etc functions. You modularize your code (at a slight cost of performance), makes debugging much easier as you have a single place where you can watch registers or memory accesses and figure out what is or isnt going on.
Based on your question, and where you are in this process, I think the first thing you need to do before writing an emulator is to write a disassembler. Being a fixed instruction length instruction set (16 bits) that makes it much much easier. You can start at some interesting point in the rom, or at the beginning if you like, and decode everything you see. For example:
With only 35 instructions that shouldnt take but an afternoon, maybe a whole saturday, being your first time decoding instructions (I assume that based on your question). The disassembler becomes the core decoder for your emulator. Replace the printf()s with emulation, even better leave the printfs and just add code to emulate the instruction execution, this way you can follow the execution. (same deal have a disassemble a single instruction function, call it for each instruction, this becomes the foundation for your emulator).
Your understanding needs to be more than vague as to what that fetch line of code is doing, in order to pull off this task you are going to have to have a strong understanding of bit manipulation.
Also I would call that line of code you provided buggy or at least risky. If memory[] is an array of bytes, the compiler might very well perform the left shift using byte sized math, resulting in a zero, then zero orred with the second byte results in only the second byte.
Basically a compiler is within its rights to turn this:
Into this:
Which wont work for you at all, a very quick fix:
Will save you some headaches. Minimal optimization will save the compiler from storing the intermediate results to ram for each operation resulting in the same (desired) output/performance.
The instruction set simulators I wrote and mentioned above are not intended for performance but instead readability, visibility, and hopefully educational. I would start with something like that then if performance for example is of interest you will have to re-write it. This chip8 emulator, once experienced, would be an afternoon task from scratch, so once you get through this the first time you could re-write it maybe three or four times in a weekend, not a monumental task (to have to re-write). (the thumbulator one took me a weekend, for the bulk of it. The msp430 one was probably more like an evening or two worth of work. Getting the overflow flag right, once and for all, was the biggest task, and that came later). Anyway, point being, look at things like the mame sources, most if not all of those instruction set simulators are designed for execution speed, many are barely readable without a fair amount of study. Often heavily table driven, sometimes lots of C programming tricks, etc. Start with something manageable, get it functioning properly, then worry about improving it for speed or size or portability or whatever. This chip8 thing looks to be graphics based so you are going to also have to deal with a lot of line drawing and other bit manipulation on a bitmap/screen/wherever. Or you could just call api or operating system functions. Basically this chip8 thing is not your traditional instruction set with registers and a laundry list of addressing modes and alu operations.
基本上——屏蔽操作码的变量部分,并寻找匹配。然后使用变量部分。
例如1NNN就是跳转。所以:
那么游戏就是让代码变得更快、更小、或者更优雅,如果你愿意的话。好干净的乐趣!
Basically -- Mask out the variable part of the opcode, and look for a match. Then use the variable part.
For example 1NNN is the jump. So:
Then the game is to make that code fast or small, or elegant, if you like. Good clean fun!
不同的CPU在内存中存储值的方式不同。大端机器以 FF、CC 的顺序在内存中存储像 $FFCC 这样的数字。 Little-endian 机器以相反的顺序 CC、FF 存储字节(即“小端”在前)。
CHIP-8 架构是大端字节序,因此您将运行的代码具有以大端字节序写入的指令和数据。
在你的语句“opcode = memory[pc] << 8 | memory[pc + 1];”中,主机CPU(计算机的CPU)是小端还是大端并不重要。它总是以正确的顺序将 16 位大端值放入整数中。
有一些资源可能会有所帮助: http://www.emulator101.com 提供了 CHIP-8 模拟器教程以及一些通用模拟器技术。这个也不错: http:// www.multigesture.net/articles/how-to-write-an-emulator-chip-8-interpreter/
Different CPUs store values in memory differently. Big endian machines store a number like $FFCC in memory in that order FF,CC. Little-endian machines store the bytes in reverse order CC, FF (that is, with the "little end" first).
The CHIP-8 architecture is big endian, so the code you will run has the instructions and data written in big endian.
In your statement "opcode = memory[pc] << 8 | memory[pc + 1];", it doesn't matter if the host CPU (the CPU of your computer) is little endian or big endian. It will always put a 16-bit big endian value into an integer in the correct order.
There are a couple of resources that might help: http://www.emulator101.com gives a CHIP-8 emulator tutorial along with some general emulator techniques. This one is good too: http://www.multigesture.net/articles/how-to-write-an-emulator-chip-8-interpreter/
您将必须设置一堆不同的位掩码,以结合有限状态机从 16 位字获取实际操作码,以便解释这些操作码,因为操作码的处理方式似乎存在一些复杂性被编码(即,某些操作码具有寄存器标识符等,而其他操作码则相当简单,具有单个标识符)。
您的有限状态机基本上可以执行以下操作:
You're going to have to setup a bunch of different bit masks to get the actual opcode from the 16-bit word in combination with a finite state machine in order to interpret those opcodes since it appears that there are some complications in how the opcodes are encoded (i.e., certain opcodes have register identifiers, etc., while others are fairly straight-forward with a single identifier).
Your finite state machine can basically do the following: