关于简单CPU仿真器实现的问题

发布于 2024-08-23 10:58:13 字数 878 浏览 5 评论 0原文

背景信息：最终，我想编写一个真机的模拟器，例如原始的任天堂或 Gameboy。然而，我决定我需要从更简单的地方开始。我的计算机科学顾问/教授向我提供了一个非常简单的假想处理器的规格，他首先创建了这个处理器来进行模拟。有 1 个寄存器（累加器）和 16 个操作码。每条指令由 16 位组成，其中前 4 位包含操作码，其余为操作数。这些指令以二进制格式的字符串形式给出，例如“0101 0101 0000 1111”。

我的问题：在C++中，解析处理指令的最佳方法是什么？请牢记我的最终目标。以下是我考虑过的一些要点：

我不能只在阅读指令时处理和执行它们，因为代码是自我修改的：一条指令可以更改后面的指令。我认为解决此问题的唯一方法是存储所有更改，并针对每条指令检查是否需要应用更改。这可能会导致对每条指令的执行进行大量比较，这是不好的。因此，我认为我必须以另一种格式重新编译指令。
虽然我可以将操作码解析为字符串并对其进行处理，但在某些情况下，整个指令必须被视为数字。例如，增量操作码甚至可以修改指令的操作码部分。
如果我要将指令转换为整数，我不确定如何仅解析 int 的操作码或操作数部分。即使我将每条指令重新编译为三个部分，整个指令作为 int，操作码作为 int，操作数作为 int，仍然无法解决问题，因为我可能必须增加整个指令然后解析受影响的操作码或操作数。此外，我是否必须编写一个函数来执行此转换，或者是否有一些 C++ 库具有将“二进制格式”字符串转换为整数的函数（如 Java 中的 Integer.parseInt(str1, 2) ）？
此外，我希望能够执行移位等操作。我不确定如何实现这一点，但这可能会影响我实现此重新编译的方式。

感谢您提供的任何帮助或建议！

原文

Background Information: Ultimately, I would like to write an emulator of a real machine such as the original Nintendo or Gameboy. However, I decided that I need to start somewhere much, much simpler. My computer science advisor/professor offered me the specifications for a very simple imaginary processor that he created to emulate first. There is one register (the accumulator) and 16 opcodes. Each instruction consists of 16 bits, the first 4 of which contain the opcode, the rest of which is the operand. The instructions are given as strings in binary format, e.g., "0101 0101 0000 1111".

My Question: In C++, what is the best way to parse the instructions for processing? Please keep my ultimate goal in mind. Here are some points I've considered:

I can't just process and execute the instructions as I read them because the code is self-modifying: an instruction can change a later instruction. The only way I can see to get around this would be to store all changes and for each instruction to check whether a change needs to be applied. This could lead to a massive amounts of comparisons with the execution of each instruction, which isn't good. And so, I think I have to recompile the instructions in another format.
Although I could parse the opcode as a string and process it, there are instances where the instruction as a whole has to be taken as a number. The increment opcode, for example, could modify even the opcode section of an instruction.
If I were to convert the instructions to integers, I'm not sure then how I could parse just the opcode or operand section of the int. Even if I were to recompile each instruction into three parts, the whole instruction as an int, the opcode as an int, and the operand as an int, that still wouldn't solve the problem, as I might have to increment an entire instruction and later parse the affected opcode or operand. Moreover, would I have to write a function to perform this conversion, or is there some library for C++ that has a function convert a string in "binary format" to an integer (like Integer.parseInt(str1, 2) in Java)?
Also, I would like to be able to perform operations such as shifting bits. I'm not sure how that can be achieved, but that might affect how I implement this recompilation.

Thank you for any help or advice you can offer!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

末蓝 2024-08-30 10:58:13

将原始代码解析为整数数组。该数组是您计算机的内存。

使用按位运算提取各个字段。例如，以下代码

unsigned int x = 0xfeed;
unsigned int opcode = (x >> 12) & 0xf;

将从存储在 unsigned int 中的 16 位值中提取最高四位（此处为 0xf）。然后，您可以使用例如 switch() 来检查操作码并采取正确的操作：

enum { ADD = 0 };

unsigned int execute(int *memory, unsigned int pc)
{
  const unsigned int opcode = (memory[pc++] >> 12) & 0xf;

  switch(opcode)
  {
  case OP_ADD:
    /* Do whatever the ADD instruction's definition mandates. */
    return pc;
  default:
    fprintf(stderr, "** Non-implemented opcode %x found in location %x\n", opcode, pc - 1);
  }
  return pc;
}

修改内存只是写入整数数组的一种情况，如果需要，也许还可以使用一些按位数学。

Parse the original code into an array of integers. This array is your computer's memory.

Use bitwise operations to extract the various fields. For instance, this:

unsigned int x = 0xfeed;
unsigned int opcode = (x >> 12) & 0xf;

will extract the topmost four bits (0xf, here) from a 16-bit value stored in an unsigned int. You can then use e.g. switch() to inspect the opcode and take the proper action:

enum { ADD = 0 };

unsigned int execute(int *memory, unsigned int pc)
{
  const unsigned int opcode = (memory[pc++] >> 12) & 0xf;

  switch(opcode)
  {
  case OP_ADD:
    /* Do whatever the ADD instruction's definition mandates. */
    return pc;
  default:
    fprintf(stderr, "** Non-implemented opcode %x found in location %x\n", opcode, pc - 1);
  }
  return pc;
}

Modifying memory is just a case of writing into your array of integers, perhaps also using some bitwise math if needed.

回复收藏 0 原文

旧瑾黎汐 2024-08-30 10:58:13

我认为最好的方法是读取指令，将它们转换为无符号整数，并将它们存储到内存中，然后从内存中执行它们。

一旦您解析了指令并将其存储到内存中，自我修改比存储每条指令的更改列表要容易得多。您可以只更改该位置的内存（假设您不需要知道旧指令是什么）。
由于您要将指令转换为整数，因此这个问题没有实际意义。
要解析操作码和操作数部分，您需要使用位移位和掩码。例如，要获取操作码，请屏蔽高 4 位并向下移动 12 位（指令 >> 12）。您也可以使用掩码来获取操作数。
你的意思是你的机器有移位指令？这不应该影响您存储操作数的方式。当您开始执行其中一条指令时，您可以仅使用 C++ 位移运算符 << 和 >>>。

回复收藏 0 原文

静赏你的温柔 2024-08-30 10:58:13

为了以防万一它有帮助，这是我用 C++ 编写的最后一个 CPU 模拟器。实际上，它是我用 C++ 编写的唯一模拟器。

该规范的语言有点特殊，但它是一个完全受人尊敬、简单的 VM 描述，可能与您教授的 VM 非常相似：

http://www.boundvariable.org/um-spec.txt

这是我的（有些过度设计的）代码，它应该会给您一些想法。例如，它展示了如何在 um.cpp 中的 Giant Switch 语句中实现数学运算符：

http: //www.eschatonic.org/misc/um.zip

你也许可以找到其他实现来与网络搜索进行比较，因为很多人参加了比赛（我不是其中之一：我做了很多之后）。虽然我猜 C++ 中的人不多。

如果我是你，我只会将指令存储为字符串来开始，如果这是你的虚拟机规范定义对它们的操作的方式。然后每次要执行它们时，根据需要将它们转换为整数。虽然会很慢，但那又怎样？你的虚拟机并不是一个真正的虚拟机，你将用它来运行对时间要求严格的程序，而且慢吞吞的解释器仍然说明了你在这个阶段需要了解的要点。

尽管虚拟机实际上可能用整数来定义所有内容，而字符串只是在程序加载到机器中时描述程序。在这种情况下，请在开始时将程序转换为整数。如果虚拟机将程序和数据存储在一起，并且对两者执行相同的操作，那么这就是正确的方法。

在它们之间进行选择的方法是查看用于修改程序的操作码。新指令是作为整数还是作为字符串提供给它？无论哪种格式，最简单的开始可能就是以该格式存储程序。一旦它发挥作用，您可以随时进行更改。

在上述 UM 的情况下，机器是根据具有 32 位空间的“盘片”来定义的。显然，这些可以在 C++ 中表示为 32 位整数，所以这就是我的实现所做的。

Just in case it helps, here's the last CPU emulator I wrote in C++. Actually, it's the only emulator I've written in C++.

The spec's language is slightly idiosyncratic but it's a perfectly respectable, simple VM description, possibly quite similar to your prof's VM:

http://www.boundvariable.org/um-spec.txt

Here's my (somewhat over-engineered) code, which should give you some ideas. For instance it shows how to implement mathematical operators, in the Giant Switch Statement in um.cpp:

http://www.eschatonic.org/misc/um.zip

You can maybe find other implementations for comparison with a web search, since plenty of people entered the contest (I wasn't one of them: I did it much later). Although not many in C++ I'd guess.

If I were you, I'd only store the instructions as strings to start with, if that's the way that your virtual machine specification defines operations on them. Then convert them to integers as needed, every time you want to execute them. It'll be slow, but so what? Yours isn't a real VM that you're going to be using to run time-critical programs, and a dog-slow interpreter still illustrates the important points you need to know at this stage.

It's possible though that the VM actually defines everything in terms of integers, and the strings are just there to describe the program when it's loaded into the machine. In that case, convert the program to integers at the start. If the VM stores programs and data together, with the same operations acting on both, then this is the way to go.

The way to choose between them is to look at the opcode which is used to modify the program. Is the new instruction supplied to it as an integer, or as a string? Whichever it is, the simplest thing to start with is probably to store the program in that format. You can always change later once it's working.

In the case of the UM described above, the machine is defined in terms of "platters" with space for 32 bits. Clearly these can be represented in C++ as 32-bit integers, so that's what my implementation does.

回复收藏 0 原文

小猫一只 2024-08-30 10:58:13

我为自定义加密处理器创建了一个模拟器。我通过创建基类树来利用 C++ 的多态性：

struct Instruction  // Contains common methods & data to all instructions.
{
    virtual void execute(void) = 0;
    virtual size_t get_instruction_size(void) const = 0;
    virtual unsigned int get_opcode(void) const = 0;
    virtual const std::string& get_instruction_name(void) = 0;
};

class Math_Instruction
:  public Instruction
{
  // Operations common to all math instructions;
};

class Branch_Instruction
:  public Instruction
{
  // Operations common to all branch instructions;
};

class Add_Instruction
:  public Math_Instruction
{
};

我还有几个工厂。至少有两个是有用的：

用于创建指令的工厂
文本。
创建指令的工厂
操作码

指令类应该具有从输入源（例如std::istream）或文本（std::string）加载数据的方法。还应该支持输出的推论方法（例如指令名称和操作码）。

我让应用程序从输入文件创建对象，并将它们放入指令向量中。 执行器方法将运行数组中每条指令的“execute()”方法。此操作向下渗透到执行详细执行的指令叶对象。

还有其他可能需要模拟的全局对象。就我而言，其中一些包括数据总线、寄存器、ALU 和内存位置。

在编码之前，请花更多时间设计和思考该项目。我发现这是一个相当大的挑战，特别是实现一个具有单步功能的调试器和 GUI。

祝你好运！

I created an emulator for a custom cryptographic processor. I exploited the polymorphism of C++ by creating a tree of base classes:

struct Instruction  // Contains common methods & data to all instructions.
{
    virtual void execute(void) = 0;
    virtual size_t get_instruction_size(void) const = 0;
    virtual unsigned int get_opcode(void) const = 0;
    virtual const std::string& get_instruction_name(void) = 0;
};

class Math_Instruction
:  public Instruction
{
  // Operations common to all math instructions;
};

class Branch_Instruction
:  public Instruction
{
  // Operations common to all branch instructions;
};

class Add_Instruction
:  public Math_Instruction
{
};

I also had a couple of factories. At least two would be useful:

Factory to create instruction from
text.
Factory to create instruction from
opcode

The instruction classes should have methods to load their data from an input source (e.g. std::istream) or text (std::string). The corollary methods of output should also be supported (such as instruction name and opcode).

I had the application create objects, from an input file, and place them into a vector of Instruction. The executor method would run the 'execute()` method of each instruction in the array. This action trickled down to the instruction leaf object which performed the detailed execution.

There are other global objects that may need emulation as well. In my case some included the data bus, registers, ALU and memory locations.

Please spend more time designing and thinking about the project before you code it. I found it quite a challenge, especially implementing a single-step capable debugger and GUI.

Good Luck!

回复收藏 0 原文

~没有更多了~