当前位置：文江博客话题详情

内核模块和SCHED_RR线程的优先级

发布于 2024-12-08 15:54:14 字数 1787 浏览 7 评论 0原文

我有一个嵌入式 Linux 平台（Beagleboard，运行 Angstrom Linux），连接了两个设备：

通过 USB 连接的激光测距仪 (Hokuyo UTM 30)
通过 SPI 连接的自定义外部板

我们编写了一个 Linux 内核模块，负责SPI 数据传输。它有一个 IRQ 处理程序，其中 spi_async 是调用这又会导致调用异步回调方法。

我的 C++ 应用程序由三个线程组成：

用于数据处理的主线程
激光轮询线程
SPI 轮询线程

我遇到的问题似乎是由上述模块的交互方式引起的。

当我关闭 USB 设备（激光测距仪）时，我正确接收所有 SPI 消息（每 3 毫秒 1 条消息，消息长度除以数据速率小于 1 毫秒），与线程调度无关
当我打开 USB 设备并运行时我的程序使用正常的线程调度（SCHED_OTHER，优先级 0，没有设置好级别）大约 1% 的消息“丢失”，因为 spi_async 的回调方法在下一个 IRQ 发生时正在运行（我可以以不同的方式处理这种情况为了不丢失消息，所以这不是一个大问题。）
在 USB 设备打开的情况下，我使用 SCHED_RR 运行程序，
- 主线程优先级 = 10
- SPI 读取线程的优先级 = 10
- USB/激光轮询线程的优先级 = 4
然后我丢失了 40% 的消息，因为在调用 spi 回调方法之前再次触发了 IRQ！（我仍然可以找到解决方法，但问题是我需要快速响应时间，在这种情况下无法再达到）。我需要使用线程调度和激光设备，所以我正在寻找一种方法来解决这种情况。

问题 1：

我的假设是内核空间中的 IRQ 处理程序和 spi_async 触发的回调比用户空间中运行的任何线程（无论是 SCHED_RR 还是 SCHED_OTHER）具有更高的优先级。这意味着在我的应用程序中转向 SCHED_RR 不应减慢 SPI 传输速度，但这似乎非常错误。是吗？

问题 2：

我如何确定这里发生了什么？存在哪些调试辅助工具？（或者也许您不需要任何进一步的信息？）对我来说，主要问题是：为什么只有在激光设备打开时我才会遇到这些问题。 USB驱动会消耗这么多时间吗？

----- 编辑：

我做了以下观察：

spi_async 的回调调用 wake_up_interruptible(&mydata->readq); （使用 wait_queue_head_t readq;）。从用户空间（我的应用程序）我调用一个函数，该函数会导致 poll_wait(file, &mydata->readq, wait); 当轮询返回用户空间时，调用 read( ）。

当我的应用程序使用 SCHED_OTHER 运行时，我可以看到回调方法首先完成，然后再进入内核模块中的 read() 方法。
当我的应用程序以 SCHED_RR 运行时，在退出回调之前会输入 read。

这似乎证明用户空间线程的优先级高于回调方法上下文的优先级。有什么方法可以改变这种行为，并且仍然为我的应用程序线程保留 SCHED_RR ？

原文

I have an embedded Linux platform (the Beagleboard, running Angstrom Linux) with two devices connected:

a Laser range finder (Hokuyo UTM 30) connected via USB
a custom external board connected via SPI

We have a written a Linux kernel module which is responsible for the SPI data transfer. It has an IRQ handler in which spi_async is called which in turn causes an async callback method to be called.

My C++ application consists of three threads:

a main thread for data processing
a laser polling thread
an SPI polling thread

I am experiencing problems which seem to be caused by how the modules described above interact.

When I switch off the USB device (laser range finder) I receive all SPI messages correctly (1 message every 3ms, message length divided by data rate is <1ms), independent from thread scheduling
When I switch on the USB device and I run my program with normal thread scheduling (SCHED_OTHER, priority 0, no nice level set) about 1% of the messages is "lost" because the callback method of spi_async is running when the next IRQ occurs (I could handle this case differently in order not to loose the messages, so this is not a big issue.)
With the USB device turned on and I run the program with SCHED_RR and
- priority = 10 for main thread
- priority = 10 for SPI reading thread
- priority = 4 for USB/Laser polling thread
then I am loosing 40% of the messages because the IRQ is triggered again before the spi-callback method is called! (I could still maybe find a workaround, but the problem is that I need fast response times which can no longer be reached in this case). I need to use the thread scheduling and the laser device so I am looking for a way to solve this case.

Question 1:

My assumption was that IRQ handlers and the callbacks triggered by spi_async in kernel space have higher priority than any thread running in user space (no matter if SCHED_RR or SCHED_OTHER). This would mean that turning to SCHED_RR in my application shouldn't slow down SPI transfer, but this seems very wrong. Is it?

Question 2:

How can I determine what happens here? Which debugging aids exist? (Or maybe you don't need any further information?) The main question for me is: why do I experience the problems only when the laser device is turned on. Could the USB driver consume so much time?

----- EDIT:

I have made the following observation:

The spi_async's callback calls wake_up_interruptible(&mydata->readq); (with wait_queue_head_t readq;). From the user space (my app) I call a function which results in poll_wait(file, &mydata->readq, wait); When the poll returns the user space calls read().

When my application runs with SCHED_OTHER I can see that the callback method first finishes before the read() method in my kernel module is entered.
When my application runs with SCHED_RR read is entered before exiting the callback.

This seems to proof that the priority of the user space threads is higher than the callback method's context's priority. Is there any way to change this behaviour and still have SCHED_RR for my application's threads?

分享到QQ

分享到微博