当线程因总线错误而终止时,如何防止信号量锁定
我正在开发一个在嵌入式 CPU 上运行的 Linux 设备驱动程序。该设备驱动程序控制一些外部硬件。外部硬件有自己的DDR控制器和外部DDR。硬件的 DDR 通过可移动内存窗口在嵌入式 CPU 上可见(因此我可以从 Linux 驱动程序对外部 DDR 进行分页访问)。我使用的是 Linux 内核版本 2.6.33。
我的驱动程序使用 sysfs 来允许从用户空间控制外部硬件。例如,外部硬件生成一个心跳计数器,该计数器递增外部 DDR 中的特定地址。驱动程序读取此信息以检测外部硬件是否仍在运行。
如果外部 DDR 工作不正常,则对外部 DDR 的访问会在嵌入式 CPU 上产生总线错误。为了防止同时多线程访问,驱动程序使用信号量。
现在来说说问题。如果线程获取信号量,然后因总线错误而终止,则信号量仍处于锁定状态。所有后续调用都会无限期地获取信号量块。我可以使用哪些技术来避免驱动程序永远挂起?
sysfs 函数示例(简化):
static ssize_t running_attr_show(struct device *dev, struct device_attribute *attr, char *buffer)
{
struct my_device * const my_dev = container_of(dev, struct my_device, dev);
int ret;
if(down_interruptible(&my_dev->sem))
{
ret = -ERESTARTSYS;
}
else
{
u32 heartbeat;
int running;
// Following line could cause bus error
heartbeat = mwindow_get_reg(&my_dev->mwindow, HEARTBEAT_COUNTER_ADDR);
running = (heartbeat != my_dev->last_heartbeat) ? 1 : 0;
my_dev->last_heartbeat = heartbeat;
ret = sprintf(buffer, "%d\n", result);
/* unlock */
up(&my_dev->sem);
}
return ret;
}
I am developing a Linux device driver running on an embedded CPU. This device driver control some external hardware. The external hardware has it's own DDR controler and external DDR. The hardware's DDR is visible on the embedded CPU via a movable memory window (so I have paged access to the external DDR from the Linux driver). I'm using Linux kernel version 2.6.33.
My driver uses sysfs to allow control of the external hardware from userspace. As an example, the external hardware generates a heartbeat counter which increments a specific address in external DDR. The driver reads this to detect if the external hardware is still running.
If the external DDR is not working correctly then an access to the external DDR produces a bus error on the embedded CPU. To protect against simultaneous multi-thread access, the driver uses a semaphore.
Now to the problem. If a thread grabs the semaphore, then terminates with a bus error, the semaphore is still locked. All subsequent calls to grab the semaphore block indefinatly. What techniques can I use to avoid this hanging the driver forever?
An example sysfs function (simplified):
static ssize_t running_attr_show(struct device *dev, struct device_attribute *attr, char *buffer)
{
struct my_device * const my_dev = container_of(dev, struct my_device, dev);
int ret;
if(down_interruptible(&my_dev->sem))
{
ret = -ERESTARTSYS;
}
else
{
u32 heartbeat;
int running;
// Following line could cause bus error
heartbeat = mwindow_get_reg(&my_dev->mwindow, HEARTBEAT_COUNTER_ADDR);
running = (heartbeat != my_dev->last_heartbeat) ? 1 : 0;
my_dev->last_heartbeat = heartbeat;
ret = sprintf(buffer, "%d\n", result);
/* unlock */
up(&my_dev->sem);
}
return ret;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要修改
mwindow_get_reg()
以及可能在总线错误时调用的架构故障处理程序,以便mwindow_get_reg()
可以返回错误,而不是终止进程。然后,您可以通过释放信号量并向用户空间返回错误来优雅地处理该错误。
You'll need to modify
mwindow_get_reg()
and possibly the architecture fault handler that's invoked on a bus error so thatmwindow_get_reg()
can return an error, rather than terminating the process.You can then handle that error gracefully, by releasing the semaphore and returning an error to userspace.
感谢@caf,这是我实施的解决方案。
我已将 mwindow_get_reg 的一部分转换为程序集。对于可能的错误读取,我在 ex_table 部分中添加了一个条目,其中包含错误地址和修复地址。如果在此地址发生异常,这会导致异常处理程序跳转到修复代码,而不是终止线程。修复汇编程序设置一个“错误”标志,然后我可以在我的 c 代码中测试该标志:
我还必须通过添加以下内容来修改我的 DBUS 异常处理程序:
Thanks to @caf, here is the solution I've implemented.
I've converted part of mwindow_get_reg to assembly. For the possible faulting read I've added an entry into the ex_table section with the faulting address and fixup address. This causes the exception handler to jump to the fixup code instead of terminating the thread if an exception occurs at this address. The fixup assembler sets a 'faulted' flag, which I can then test for in my c code:
I also had to modify my DBUS exception handler by adding the following: