在 macOS Sierra 之前,为什么 XNU 不在其 kqueue_scan_continue 函数中处理 THREAD_RESTART?

发布于 2025-01-13 11:57:07 字数 3440 浏览 0 评论 0原文

我正在尝试查找 Chromium Legacy 项目触发的严重内核恐慌的原因将现代版本的 Chromium 向后移植到旧版本的 macOS (10.7 – 10.10)。当调用 kqueue_scan_continue 函数并将 wait_result 参数设置为 THREAD_RESTART 时,会发生内核恐慌。

XNU 2422 (OS X 10.9.5), kqueue_scan_continue 看起来像这样:

static void
kqueue_scan_continue(void *data, wait_result_t wait_result)
{
    thread_t self = current_thread();
    uthread_t ut = (uthread_t)get_bsdthread_info(self);
    struct _kqueue_scan * cont_args = &ut->uu_kevent.ss_kqueue_scan;
    struct kqueue *kq = (struct kqueue *)data;
    int error;
    int count;

    /* convert the (previous) wait_result to a proper error */
    switch (wait_result) {
    case THREAD_AWAKENED:
        kqlock(kq);
        error = kqueue_process(kq, cont_args->call, cont_args, &count,
            current_proc());
        if (error == 0 && count == 0) {
            wait_queue_assert_wait((wait_queue_t)kq->kq_wqs,
                KQ_EVENT, THREAD_ABORTSAFE, cont_args->deadline);
            kq->kq_state |= KQ_SLEEP;
            kqunlock(kq);
            thread_block_parameter(kqueue_scan_continue, kq);
            /* NOTREACHED */
        }
        kqunlock(kq);
        break;
    case THREAD_TIMED_OUT:
        error = EWOULDBLOCK;
        break;
    case THREAD_INTERRUPTED:
        error = EINTR;
        break;
    default:
        panic("%s: - invalid wait_result (%d)", __func__,
            wait_result);
        error = 0;
    }

    /* call the continuation with the results */
    assert(cont_args->cont != NULL);
    (cont_args->cont)(kq, cont_args->data, error);
}

很容易看出为什么这会导致内核恐慌。 switch 语句期望 wait_resultTHREAD_AWAKENEDTHREAD_TIMED_OUTTHREAD_INTERRUPTED。如果是其他情况,例如 THREAD_RESTART,则选择默认情况,并且内核会发生恐慌。

在 macOS Sierra 中,Apple 添加了此 switch 语句的另一个情况 用于处理 THREAD_RESTART

    case THREAD_RESTART:
        error = EBADF;
        break;

当我将此代码添加到旧内核并重新编译 XNU 时,它们在运行 Chromium 时不再出现恐慌 遗产。

我的问题是,为什么 Apple 直到 macOS Sierra 才处理此函数中的 THREAD_RESTARTTHREAD_RESTART wait_result_t 的有效值,并由 各种内部内核函数

最明显的解释是“苹果犯了一个错误”,也许就是这样!然而,在高度敏感的内核代码中,这个错误似乎太明显了,以至于多年来都没有被注意到!

这看起来像是一个简单的人为错误,还是 Apple 可能认为处理 THREAD_RESTART 是不必要的?例如,使用 THREAD_RESTART 调用 kqueue_scan_continue 应该是不可能的吗?


仅供参考,这里是 Chromium Legacy GitHub 问题,其中一些聪明的人们帮助我弄清楚了这个问题中的很多信息。

I'm trying to find the cause of a nasty kernel panic triggered by Chromium Legacy, a project to backport modern versions of Chromium to old versions of macOS (10.7 – 10.10). The kernel panic occurs when the kqueue_scan_continue function is called with the wait_result parameter set to THREAD_RESTART.

In XNU 2422 (OS X 10.9.5), kqueue_scan_continue looks like this:

static void
kqueue_scan_continue(void *data, wait_result_t wait_result)
{
    thread_t self = current_thread();
    uthread_t ut = (uthread_t)get_bsdthread_info(self);
    struct _kqueue_scan * cont_args = &ut->uu_kevent.ss_kqueue_scan;
    struct kqueue *kq = (struct kqueue *)data;
    int error;
    int count;

    /* convert the (previous) wait_result to a proper error */
    switch (wait_result) {
    case THREAD_AWAKENED:
        kqlock(kq);
        error = kqueue_process(kq, cont_args->call, cont_args, &count,
            current_proc());
        if (error == 0 && count == 0) {
            wait_queue_assert_wait((wait_queue_t)kq->kq_wqs,
                KQ_EVENT, THREAD_ABORTSAFE, cont_args->deadline);
            kq->kq_state |= KQ_SLEEP;
            kqunlock(kq);
            thread_block_parameter(kqueue_scan_continue, kq);
            /* NOTREACHED */
        }
        kqunlock(kq);
        break;
    case THREAD_TIMED_OUT:
        error = EWOULDBLOCK;
        break;
    case THREAD_INTERRUPTED:
        error = EINTR;
        break;
    default:
        panic("%s: - invalid wait_result (%d)", __func__,
            wait_result);
        error = 0;
    }

    /* call the continuation with the results */
    assert(cont_args->cont != NULL);
    (cont_args->cont)(kq, cont_args->data, error);
}

It's easy to see why this leads to a kernel panic. The switch statement expects wait_result to be either THREAD_AWAKENED, THREAD_TIMED_OUT, or THREAD_INTERRUPTED. If it's something else, such as THREAD_RESTART, the default case is selected, and the kernel panics.

In macOS Sierra, Apple added an additional case to this switch statement to handle THREAD_RESTART:

    case THREAD_RESTART:
        error = EBADF;
        break;

When I add this code to older kernels and recompile XNU, they no longer panic while running Chromium Legacy.

My question is, why did it take Apple until macOS Sierra to handle THREAD_RESTART in this function? THREAD_RESTART is a valid value for wait_result_t, and is returned by various internal kernel functions.

The most obvious explanation is "Apple made a mistake", and that may be all it is! However, it feels like too obvious a mistake to go unnoticed for years in highly-sensitive kernel code!

Does this look like a simple human error, or is there a reason Apple may have thought that handling THREAD_RESTART was unnecessary? For example, is calling kqueue_scan_continue with THREAD_RESTART supposed to be impossible?


Just for reference, here's the Chromium Legacy GitHub issue where some smart people helped me figure out a lot of the information in this question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文