内核模块和SCHED_RR线程的优先级
我有一个嵌入式 Linux 平台(Beagleboard,运行 Angstrom Linux),连接了两个设备:
- 通过 USB 连接的激光测距仪 (Hokuyo UTM 30)
- 通过 SPI 连接的自定义外部板
我们编写了一个 Linux 内核模块,负责SPI 数据传输。它有一个 IRQ 处理程序,其中 spi_async 是调用这又会导致调用异步回调方法。
我的 C++ 应用程序由三个线程组成:
- 用于数据处理的主线程
- 激光轮询线程
- SPI 轮询线程
我遇到的问题似乎是由上述模块的交互方式引起的。
- 当我关闭 USB 设备(激光测距仪)时,我正确接收所有 SPI 消息(每 3 毫秒 1 条消息,消息长度除以数据速率小于 1 毫秒),与线程调度无关
- 当我打开 USB 设备并运行时我的程序使用正常的线程调度(SCHED_OTHER,优先级 0,没有设置好级别)大约 1% 的消息“丢失”,因为 spi_async 的回调方法在下一个 IRQ 发生时正在运行(我可以以不同的方式处理这种情况为了不丢失消息,所以这不是一个大问题。)
在 USB 设备打开的情况下,我使用 SCHED_RR 运行程序,
- 主线程优先级 = 10
- SPI 读取线程的优先级 = 10
- USB/激光轮询线程的优先级 = 4
然后我丢失了 40% 的消息,因为在调用 spi 回调方法之前再次触发了 IRQ! (我仍然可以找到解决方法,但问题是我需要快速响应时间,在这种情况下无法再达到)。我需要使用线程调度和激光设备,所以我正在寻找一种方法来解决这种情况。
问题 1:
我的假设是内核空间中的 IRQ 处理程序和 spi_async 触发的回调比用户空间中运行的任何线程(无论是 SCHED_RR 还是 SCHED_OTHER)具有更高的优先级。这意味着在我的应用程序中转向 SCHED_RR 不应减慢 SPI 传输速度,但这似乎非常错误。是吗?
问题 2:
我如何确定这里发生了什么?存在哪些调试辅助工具? (或者也许您不需要任何进一步的信息?)对我来说,主要问题是:为什么只有在激光设备打开时我才会遇到这些问题。 USB驱动会消耗这么多时间吗?
----- 编辑:
我做了以下观察:
spi_async 的回调调用 wake_up_interruptible(&mydata->readq);
(使用 wait_queue_head_t readq;
)。从用户空间(我的应用程序)我调用一个函数,该函数会导致 poll_wait(file, &mydata->readq, wait);
当轮询返回用户空间时,调用 read( )
。
- 当我的应用程序使用
SCHED_OTHER
运行时,我可以看到回调方法首先完成,然后再进入内核模块中的read()
方法。 - 当我的应用程序以
SCHED_RR
运行时,在退出回调之前会输入 read。
这似乎证明用户空间线程的优先级高于回调方法上下文的优先级。有什么方法可以改变这种行为,并且仍然为我的应用程序线程保留 SCHED_RR
?
I have an embedded Linux platform (the Beagleboard, running Angstrom Linux) with two devices connected:
- a Laser range finder (Hokuyo UTM 30) connected via USB
- a custom external board connected via SPI
We have a written a Linux kernel module which is responsible for the SPI data transfer. It has an IRQ handler in which spi_async is called which in turn causes an async callback method to be called.
My C++ application consists of three threads:
- a main thread for data processing
- a laser polling thread
- an SPI polling thread
I am experiencing problems which seem to be caused by how the modules described above interact.
- When I switch off the USB device (laser range finder) I receive all SPI messages correctly (1 message every 3ms, message length divided by data rate is <1ms), independent from thread scheduling
- When I switch on the USB device and I run my program with normal thread scheduling (SCHED_OTHER, priority 0, no nice level set) about 1% of the messages is "lost" because the callback method of spi_async is running when the next IRQ occurs (I could handle this case differently in order not to loose the messages, so this is not a big issue.)
With the USB device turned on and I run the program with SCHED_RR and
- priority = 10 for main thread
- priority = 10 for SPI reading thread
- priority = 4 for USB/Laser polling thread
then I am loosing 40% of the messages because the IRQ is triggered again before the spi-callback method is called! (I could still maybe find a workaround, but the problem is that I need fast response times which can no longer be reached in this case). I need to use the thread scheduling and the laser device so I am looking for a way to solve this case.
Question 1:
My assumption was that IRQ handlers and the callbacks triggered by spi_async in kernel space have higher priority than any thread running in user space (no matter if SCHED_RR or SCHED_OTHER). This would mean that turning to SCHED_RR in my application shouldn't slow down SPI transfer, but this seems very wrong. Is it?
Question 2:
How can I determine what happens here? Which debugging aids exist? (Or maybe you don't need any further information?) The main question for me is: why do I experience the problems only when the laser device is turned on. Could the USB driver consume so much time?
----- EDIT:
I have made the following observation:
The spi_async's callback calls wake_up_interruptible(&mydata->readq);
(with wait_queue_head_t readq;
). From the user space (my app) I call a function which results in poll_wait(file, &mydata->readq, wait);
When the poll returns the user space calls read()
.
- When my application runs with
SCHED_OTHER
I can see that the callback method first finishes before theread()
method in my kernel module is entered. - When my application runs with
SCHED_RR
read is entered before exiting the callback.
This seems to proof that the priority of the user space threads is higher than the callback method's context's priority. Is there any way to change this behaviour and still have SCHED_RR
for my application's threads?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
并非所有内核线程都有 RT 优先级。想象一下,一个需要做一些后台工作的定期唤醒线程正在被唤醒。您不希望该线程抢占您的 RT 线程。所以我想你的第一个假设是错误的。
根据您的其他问题:
您的主处理线程似乎妨碍了负责spi数据传输的spi驱动程序线程。
发生的情况如下:
您可以做的是回到正常的调度,同时使用各种 CONFIG_PREEMPT_ 选项。或者搞乱 spi 主驱动程序,以确保任何延迟的工作都以足够的优先级排队。或者甚至根本不排队。
Not all kernel thread have an RT priority. Imagine a periodically waking up thread that needs to do some background work is waking up. You don't want this thread to preemt your RT thread. So I guess your first assumption is wrong.
Based on your other questions :
It seems your main processing thread get in the way of the spi driver thread responsible for the spi data transfer.
Here is what happens :
What you can do is going back to normal scheduling, while playing with the various CONFIG_PREEMPT_ options. Or mess with the spi master driver, to ensure that any delayed work is queued with enough priority. Or even not queued at all.