在 macOS Sierra 之前,为什么 XNU 不在其 kqueue_scan_continue 函数中处理 THREAD_RESTART?
我正在尝试查找 Chromium Legacy 项目触发的严重内核恐慌的原因将现代版本的 Chromium 向后移植到旧版本的 macOS (10.7 – 10.10)。当调用 kqueue_scan_continue
函数并将 wait_result
参数设置为 THREAD_RESTART
时,会发生内核恐慌。
在 XNU 2422 (OS X 10.9.5), kqueue_scan_continue
看起来像这样:
static void
kqueue_scan_continue(void *data, wait_result_t wait_result)
{
thread_t self = current_thread();
uthread_t ut = (uthread_t)get_bsdthread_info(self);
struct _kqueue_scan * cont_args = &ut->uu_kevent.ss_kqueue_scan;
struct kqueue *kq = (struct kqueue *)data;
int error;
int count;
/* convert the (previous) wait_result to a proper error */
switch (wait_result) {
case THREAD_AWAKENED:
kqlock(kq);
error = kqueue_process(kq, cont_args->call, cont_args, &count,
current_proc());
if (error == 0 && count == 0) {
wait_queue_assert_wait((wait_queue_t)kq->kq_wqs,
KQ_EVENT, THREAD_ABORTSAFE, cont_args->deadline);
kq->kq_state |= KQ_SLEEP;
kqunlock(kq);
thread_block_parameter(kqueue_scan_continue, kq);
/* NOTREACHED */
}
kqunlock(kq);
break;
case THREAD_TIMED_OUT:
error = EWOULDBLOCK;
break;
case THREAD_INTERRUPTED:
error = EINTR;
break;
default:
panic("%s: - invalid wait_result (%d)", __func__,
wait_result);
error = 0;
}
/* call the continuation with the results */
assert(cont_args->cont != NULL);
(cont_args->cont)(kq, cont_args->data, error);
}
很容易看出为什么这会导致内核恐慌。 switch 语句期望 wait_result
为 THREAD_AWAKENED
、THREAD_TIMED_OUT
或 THREAD_INTERRUPTED
。如果是其他情况,例如 THREAD_RESTART
,则选择默认情况,并且内核会发生恐慌。
在 macOS Sierra 中,Apple 添加了此 switch 语句的另一个情况 用于处理 THREAD_RESTART
:
case THREAD_RESTART:
error = EBADF;
break;
当我将此代码添加到旧内核并重新编译 XNU 时,它们在运行 Chromium 时不再出现恐慌 遗产。
我的问题是,为什么 Apple 直到 macOS Sierra 才处理此函数中的 THREAD_RESTART
? THREAD_RESTART
是wait_result_t
的有效值,并由 各种内部内核函数。
最明显的解释是“苹果犯了一个错误”,也许就是这样!然而,在高度敏感的内核代码中,这个错误似乎太明显了,以至于多年来都没有被注意到!
这看起来像是一个简单的人为错误,还是 Apple 可能认为处理 THREAD_RESTART
是不必要的?例如,使用 THREAD_RESTART
调用 kqueue_scan_continue
应该是不可能的吗?
仅供参考,这里是 Chromium Legacy GitHub 问题,其中一些聪明的人们帮助我弄清楚了这个问题中的很多信息。
I'm trying to find the cause of a nasty kernel panic triggered by Chromium Legacy, a project to backport modern versions of Chromium to old versions of macOS (10.7 – 10.10). The kernel panic occurs when the kqueue_scan_continue
function is called with the wait_result
parameter set to THREAD_RESTART
.
In XNU 2422 (OS X 10.9.5), kqueue_scan_continue
looks like this:
static void
kqueue_scan_continue(void *data, wait_result_t wait_result)
{
thread_t self = current_thread();
uthread_t ut = (uthread_t)get_bsdthread_info(self);
struct _kqueue_scan * cont_args = &ut->uu_kevent.ss_kqueue_scan;
struct kqueue *kq = (struct kqueue *)data;
int error;
int count;
/* convert the (previous) wait_result to a proper error */
switch (wait_result) {
case THREAD_AWAKENED:
kqlock(kq);
error = kqueue_process(kq, cont_args->call, cont_args, &count,
current_proc());
if (error == 0 && count == 0) {
wait_queue_assert_wait((wait_queue_t)kq->kq_wqs,
KQ_EVENT, THREAD_ABORTSAFE, cont_args->deadline);
kq->kq_state |= KQ_SLEEP;
kqunlock(kq);
thread_block_parameter(kqueue_scan_continue, kq);
/* NOTREACHED */
}
kqunlock(kq);
break;
case THREAD_TIMED_OUT:
error = EWOULDBLOCK;
break;
case THREAD_INTERRUPTED:
error = EINTR;
break;
default:
panic("%s: - invalid wait_result (%d)", __func__,
wait_result);
error = 0;
}
/* call the continuation with the results */
assert(cont_args->cont != NULL);
(cont_args->cont)(kq, cont_args->data, error);
}
It's easy to see why this leads to a kernel panic. The switch statement expects wait_result
to be either THREAD_AWAKENED
, THREAD_TIMED_OUT
, or THREAD_INTERRUPTED
. If it's something else, such as THREAD_RESTART
, the default case is selected, and the kernel panics.
In macOS Sierra, Apple added an additional case to this switch statement to handle THREAD_RESTART
:
case THREAD_RESTART:
error = EBADF;
break;
When I add this code to older kernels and recompile XNU, they no longer panic while running Chromium Legacy.
My question is, why did it take Apple until macOS Sierra to handle THREAD_RESTART
in this function? THREAD_RESTART
is a valid value for wait_result_t
, and is returned by various internal kernel functions.
The most obvious explanation is "Apple made a mistake", and that may be all it is! However, it feels like too obvious a mistake to go unnoticed for years in highly-sensitive kernel code!
Does this look like a simple human error, or is there a reason Apple may have thought that handling THREAD_RESTART
was unnecessary? For example, is calling kqueue_scan_continue
with THREAD_RESTART
supposed to be impossible?
Just for reference, here's the Chromium Legacy GitHub issue where some smart people helped me figure out a lot of the information in this question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论