操作系统开发：如何避免异常例程后的无限循环

发布于 2025-01-03 01:15:49 字数 1046 浏览 1 评论 0原文

几个月来我一直致力于“自制”操作系统。目前，它启动并进入 32 位保护模式。我已经加载了中断表，但还没有设置分页。

现在，在编写异常例程时，我注意到当指令引发异常时，异常例程会被执行，但随后 CPU 会跳回引发异常的指令！这并不适用于每个异常（例如，除零异常将跳回除法指令之后的指令），但让我们考虑以下一般保护异常：

MOV EAX, 0x8
MOV CS, EAX

我的例程很简单：它调用一个显示红色的函数错误信息。

结果：MOV CS、EAX失败->我的错误信息显示-> CPU跳回MOV CS ->无限循环发送错误消息。

我和一位操作系统和unix安全方面的老师讨论过这个问题。他告诉我他知道 Linux 有办法解决这个问题，但他不知道是哪一种。

最简单的解决方案是从例程中解析抛出指令，以获得该指令的长度。该解决方案非常复杂，我觉得在每个受影响的异常例程中添加对相对较重的函数的调用有点不舒服......

因此，我想知道这是否是解决问题的另一种方法。也许有一个“神奇”寄存器包含可以改变这种行为的位？

提前非常感谢您提供任何建议/信息。

编辑：似乎很多人想知道为什么我要跳过有问题的指令并恢复正常执行。

我这样做有两个原因：

首先，终止进程是一种可能的解决方案，但不是一个干净的解决方案。这不是 Linux 中的做法，例如，（AFAIK）内核发送信号（我认为是 SIGSEGV）但不会立即中断执行。这是有道理的，因为应用程序可以阻止或忽略信号并恢复自己的执行。在我看来，这是一种非常优雅的方式来告诉应用程序它做错了什么。
另一个原因：如果内核本身执行了非法操作怎么办？可能是由于错误，但也可能是由于内核扩展。正如我在评论中所说：在这种情况下我应该做什么？我应该杀死内核并显示一个带有笑脸的漂亮蓝屏吗？

这就是为什么我希望能够跳过指令。 “猜测”指令大小显然不是一个选项，并且解析指令似乎相当复杂（并不是我介意实现这样的例程，但我需要确保没有更好的方法）。

原文

For some months I've been working on a "home-made" operating system.
Currently, it boots and goes into 32-bit protected mode.
I've loaded the interrupt table, but haven't set up the pagination (yet).

Now while writing my exception routines I've noticed that when an instruction throws an exception, the exception routine is executed, but then the CPU jumps back to the instruction which threw the exception! This does not apply to every exception (for example, a div by zero exception will jump back to the instruction AFTER the division instruction), but let's consider the following general protection exception:

MOV EAX, 0x8
MOV CS, EAX

My routine is simple: it calls a function that displays a red error message.

The result: MOV CS, EAX fails -> My error message is displayed -> CPU jumps back to MOV CS -> infinite loop spamming the error message.

I've talked about this issue with a teacher in operating systems and unix security.
He told me he knows Linux has a way around it, but he doesn't know which one.

The naive solution would be to parse the throwing instruction from within the routine, in order to get the length of that instruction.
That solution is pretty complex, and I feel a bit uncomfortable adding a call to a relatively heavy function in every affected exception routine...

Therefore, I was wondering if the is another way around the problem. Maybe there's a "magic" register that contains a bit that can change this behaviour?

Thank you very much in advance for any suggestion/information.

EDIT: It seems many people wonder why I want to skip over the problematic instruction and resume normal execution.

I have two reasons for this:

First of all, killing a process would be a possible solution, but not a clean one. That's not how it's done in Linux, for example, where (AFAIK) the kernel sends a signal (I think SIGSEGV) but does not immediately break execution. It makes sense, since the application can block or ignore the signal and resume its own execution. It's a very elegant way to tell the application it did something wrong IMO.
Another reason: what if the kernel itself performs an illegal operation? Could be due to a bug, but could also be due to a kernel extension. As I've stated in a comment: what should I do in that case? Shall I just kill the kernel and display a nice blue screen with a smiley?

That's why I would like to be able to jump over the instruction. "Guessing" the instruction size is obviously not an option, and parsing the instruction seems fairly complex (not that I mind implementing such a routine, but I need to be sure there is no better way).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蘑菇王子 2025-01-10 01:15:49

不同的异常有不同的原因。有些异常是正常的，异常只是告诉内核在允许软件继续运行之前需要做什么。此类示例包括告诉内核需要从交换空间加载数据的页面错误、告诉内核需要模拟 CPU 不支持的指令的未定义指令异常，或者告诉内核它需要的调试/断点异常需要通知调试器。对于这些，内核修复问题并默默地继续是正常的。

有些异常表示异常情况（例如软件崩溃）。处理这些类型的异常的唯一明智的方法是停止运行软件。您可以保存信息（例如核心转储）或显示信息（例如“蓝屏死机”）来帮助调试，但最终软件停止（要么进程终止，要么内核进入“什么都不做，直到用户重置计算机”状态）。

忽视异常情况只会让人们更难找出问题所在。例如，想象一下去厕所的指令：

进入浴室
移除裤子
坐下
开始生成输出

现在想象步骤 2 由于您穿着短裤而失败（“找不到裤子”例外）。您是否想在此时停止（使用易于理解的错误消息或其他内容），或者忽略该步骤并尝试在所有有用的诊断信息消失后找出问题所在？

回复收藏 0 原文

日暮斜阳 2025-01-10 01:15:49

如果我理解正确的话，您想跳过导致异常的指令（例如mov cs, eax）并在下一条指令处继续执行程序。

你为什么要这样做？通常，程序的其余部分不应该依赖于该指令成功执行的效果吗？

一般来说，异常处理有以下三种方法：

将异常视为不可修复的情况并终止进程。例如，除以零通常就是这样处理的。
修复环境，然后再次执行指令。例如，页面错误有时会这样处理。
使用软件模拟指令并在指令流中跳过它。例如，复杂的算术指令有时会以这种方式处理。

回复收藏 0 原文

醉生梦死 2025-01-10 01:15:49

您所看到的是一般保护例外的特征。英特尔系统编程指南明确指出（6.15 异常和中断参考/中断 13 - 通用保护异常 (#GP)）：

Saved Instruction Pointer
The saved contents of CS and EIP registers point to the instruction that generated the
exception.

因此，您需要编写一个异常处理程序来跳过该指令（这有点奇怪），或者只是使用“$SAVED_EIP 处的常规保护异常”或类似消息来终止有问题的进程。

What you're seeing is the characteristic of the General Protection Exception. The Intel System Programming Guide clearly states that (6.15 Exception and Interrupt Reference / Interrupt 13 - General Protection Exception (#GP)) :

Saved Instruction Pointer
The saved contents of CS and EIP registers point to the instruction that generated the
exception.

Therefore, you need to write an exception handler that will skip over that instruction (which would be kind of weird), or just simply kill the offending process with "General Protection Exception at $SAVED_EIP" or a similar message.

回复收藏 0 原文

枫以 2025-01-10 01:15:49

我可以想象几种情况，其中人们希望通过解析失败的指令、模拟其操作，然后返回到之后的指令来响应 GPF。正常的模式是进行设置，以便指令重试时会成功，但例如，可能有一些代码希望访问地址 0x000A0000-0x000AFFFF 处的某些硬件，并希望在缺乏此类硬件的机器上运行它。在这种情况下，人们可能不想在该空间中存储“真实”内存，因为每个单独的访问都必须被捕获并单独处理。我不确定是否有任何方法可以处理这个问题，而不必解码试图访问该内存的任何指令，尽管我确实知道一些虚拟 PC 程序似乎可以很好地管理它。

否则，我建议您应该为每个线程设置一个跳转向量，当系统遇到 GPF 时应使用该向量。通常该向量应该指向一个线程退出例程，但是即将用指针执行“可疑”操作的代码可以将其设置为适合该代码的错误处理程序（该代码应该在处理该区域时取消设置该向量错误处理程序是合适的）。

我可以想象人们可能想要模拟一条指令而不执行它的情况，以及人们可能想要将控制权转移到错误处理程序例程的情况，但我无法想象人们想要简单地跳过一条指令的情况会引起 GPF。

回复收藏 0 原文

~没有更多了~