当线程因总线错误而终止时，如何防止信号量锁定

发布于 2024-10-25 20:16:38 字数 1201 浏览 12 评论 0原文

我正在开发一个在嵌入式 CPU 上运行的 Linux 设备驱动程序。该设备驱动程序控制一些外部硬件。外部硬件有自己的DDR控制器和外部DDR。硬件的 DDR 通过可移动内存窗口在嵌入式 CPU 上可见（因此我可以从 Linux 驱动程序对外部 DDR 进行分页访问）。我使用的是 Linux 内核版本 2.6.33。

我的驱动程序使用 sysfs 来允许从用户空间控制外部硬件。例如，外部硬件生成一个心跳计数器，该计数器递增外部 DDR 中的特定地址。驱动程序读取此信息以检测外部硬件是否仍在运行。

如果外部 DDR 工作不正常，则对外部 DDR 的访问会在嵌入式 CPU 上产生总线错误。为了防止同时多线程访问，驱动程序使用信号量。

现在来说说问题。如果线程获取信号量，然后因总线错误而终止，则信号量仍处于锁定状态。所有后续调用都会无限期地获取信号量块。我可以使用哪些技术来避免驱动程序永远挂起？

sysfs 函数示例（简化）：

static ssize_t running_attr_show(struct device *dev, struct device_attribute *attr, char *buffer)
{
    struct my_device * const my_dev = container_of(dev, struct my_device, dev);
    int ret;

    if(down_interruptible(&my_dev->sem))
    {
        ret = -ERESTARTSYS;
    }
    else
    {
        u32 heartbeat;
        int running;

        // Following line could cause bus error
        heartbeat = mwindow_get_reg(&my_dev->mwindow, HEARTBEAT_COUNTER_ADDR);

        running = (heartbeat != my_dev->last_heartbeat) ? 1 : 0;
        my_dev->last_heartbeat = heartbeat;

        ret = sprintf(buffer, "%d\n", result);

        /* unlock */
        up(&my_dev->sem);
    }

    return ret;
}

原文

I am developing a Linux device driver running on an embedded CPU. This device driver control some external hardware. The external hardware has it's own DDR controler and external DDR. The hardware's DDR is visible on the embedded CPU via a movable memory window (so I have paged access to the external DDR from the Linux driver). I'm using Linux kernel version 2.6.33.

My driver uses sysfs to allow control of the external hardware from userspace. As an example, the external hardware generates a heartbeat counter which increments a specific address in external DDR. The driver reads this to detect if the external hardware is still running.

If the external DDR is not working correctly then an access to the external DDR produces a bus error on the embedded CPU. To protect against simultaneous multi-thread access, the driver uses a semaphore.

Now to the problem. If a thread grabs the semaphore, then terminates with a bus error, the semaphore is still locked. All subsequent calls to grab the semaphore block indefinatly. What techniques can I use to avoid this hanging the driver forever?

An example sysfs function (simplified):

static ssize_t running_attr_show(struct device *dev, struct device_attribute *attr, char *buffer)
{
    struct my_device * const my_dev = container_of(dev, struct my_device, dev);
    int ret;

    if(down_interruptible(&my_dev->sem))
    {
        ret = -ERESTARTSYS;
    }
    else
    {
        u32 heartbeat;
        int running;

        // Following line could cause bus error
        heartbeat = mwindow_get_reg(&my_dev->mwindow, HEARTBEAT_COUNTER_ADDR);

        running = (heartbeat != my_dev->last_heartbeat) ? 1 : 0;
        my_dev->last_heartbeat = heartbeat;

        ret = sprintf(buffer, "%d\n", result);

        /* unlock */
        up(&my_dev->sem);
    }

    return ret;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

为人所爱 2024-11-01 20:16:38

您需要修改 mwindow_get_reg() 以及可能在总线错误时调用的架构故障处理程序，以便 mwindow_get_reg() 可以返回错误，而不是终止进程。

然后，您可以通过释放信号量并向用户空间返回错误来优雅地处理该错误。

回复收藏 0 原文

机场等船 2024-11-01 20:16:38

感谢@caf，这是我实施的解决方案。

我已将 mwindow_get_reg 的一部分转换为程序集。对于可能的错误读取，我在 ex_table 部分中添加了一个条目，其中包含错误地址和修复地址。如果在此地址发生异常，这会导致异常处理程序跳转到修复代码，而不是终止线程。修复汇编程序设置一个“错误”标志，然后我可以在我的 c 代码中测试该标志：

unsigned long ret = 0;
int faulted;

asm volatile(
        "  1:      lwi     %0, %2, 0;         "     // ret = *window_addr
        "  2:      addik   %1, r0, 0;         "     // faulted = 0
        "  3:                                 "
        "          .section .fixup, \"ax\";   "     // fixup code executed if exception occurs
        "  4:      brid    3b;                "     // jump to next line of c code
        "          addik   %1, r0, 1;         "     // faulted = 1 (in delay slot)
        "          .previous;                 "
        "          .section __ex_table,\"a\"; "
        "          .word   1b,4b;             "     // ex_table entry. Gives fault address and jump address if fault occurs
        "          .previous;                 "
           : "=r" (ret), "=r" (faulted)             // output registers
           : "r" (window_addr)                      // input registers
);

if (faulted)
{
    printk(KERN_ERROR "%s: %s: FAULTED!", MODNAME, __FUNCTION__);
    ret = 0xdeadbeef;
}

我还必须通过添加以下内容来修改我的 DBUS 异常处理程序：

const struct exception_table_entry *fixup;
fixup = search_exception_tables(regs->pc);
if (fixup) {
    printk(KERN_ERROR "DBUS exception: calling fixup\n");
    regs->pc = fixup->fixup;
    return;
}

Thanks to @caf, here is the solution I've implemented.

I've converted part of mwindow_get_reg to assembly. For the possible faulting read I've added an entry into the ex_table section with the faulting address and fixup address. This causes the exception handler to jump to the fixup code instead of terminating the thread if an exception occurs at this address. The fixup assembler sets a 'faulted' flag, which I can then test for in my c code:

unsigned long ret = 0;
int faulted;

asm volatile(
        "  1:      lwi     %0, %2, 0;         "     // ret = *window_addr
        "  2:      addik   %1, r0, 0;         "     // faulted = 0
        "  3:                                 "
        "          .section .fixup, \"ax\";   "     // fixup code executed if exception occurs
        "  4:      brid    3b;                "     // jump to next line of c code
        "          addik   %1, r0, 1;         "     // faulted = 1 (in delay slot)
        "          .previous;                 "
        "          .section __ex_table,\"a\"; "
        "          .word   1b,4b;             "     // ex_table entry. Gives fault address and jump address if fault occurs
        "          .previous;                 "
           : "=r" (ret), "=r" (faulted)             // output registers
           : "r" (window_addr)                      // input registers
);

if (faulted)
{
    printk(KERN_ERROR "%s: %s: FAULTED!", MODNAME, __FUNCTION__);
    ret = 0xdeadbeef;
}

I also had to modify my DBUS exception handler by adding the following:

const struct exception_table_entry *fixup;
fixup = search_exception_tables(regs->pc);
if (fixup) {
    printk(KERN_ERROR "DBUS exception: calling fixup\n");
    regs->pc = fixup->fixup;
    return;
}

回复收藏 0 原文

~没有更多了~