如何删除发生故障的内核模块
这种情况总是困扰我:
我编写了一个内核模块,有时它有一个错误(取消引用 NULL 指针)。在我 insmod hello.ko 后,它显示了一些内核错误。 然后我更改代码,并尝试删除该模块并再次安装。 问题是:我不知道如何删除内核模块。
$ rmmod 你好
错误:模块 hello 正在使用
$ rmmod -f 你好
错误:删除 hello:设备或资源繁忙
我总是重新启动计算机以删除模块,这花费了太长时间。有人对此有更好的解决方案吗?感谢您的任何意见。
This situation always bother me:
I wrote a kernel module, and sometimes it has a bug(dereference a NULL pointer). After I insmod hello.ko, it shows some kernel errors.
Then I change the code, and try to remove the module and install it again.
The question is: I don't know how to remove the kernel module.
$ rmmod hello
ERROR: module hello in use
$ rmmod -f hello
ERROR: removing hello: device or resource busy
I always reboot the machine in order to remove the module, which takes too long. Does anyone have a better solution for this? Thanks for any inputs.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用虚拟机。
一旦您犯了 NULL 取消引用或其他此类错误,您就将内核置于未知状态。即使您确实设法删除了该模块(这不太可能;内核 OOPS 会杀死调用线程,因此它永远不会有机会减少引用计数 - 该模块永远不会可删除)仍然可能腐败遗留下来,而你的新的“固定”模块也很可能遇到麻烦。
最好只使用快速重新启动的虚拟机 - 也许带有快照,以使恢复速度更快。
Use a virtual machine.
Once you make a NULL dereference or other such mistake, you've put the kernel into an unknown state. Even if you did manage to remove the module (which is unlikely to be possible; a kernel OOPS kills the calling thread, so it'll never have a chance to reduce the reference count - the module will never be removable) there may still be corruption left behind, and your new, 'fixed' module is just as likely to be in trouble.
Much better to just use a fast-to-reboot virtual machine - perhaps with a snapshot, to make restoration even faster.
正如 bdonlan 指出的,使用虚拟机会更好。
但是,如果您真的想按照自己的方式进行操作,则必须:
kernel/module.c
中的delete_module
系统调用As bdonlan pointed out, you would be better off with a virtual machine.
However, if you really want to do it your way, you have to:
delete_module
system call inkernel/module.c
对于我的案例,使用者 列(请参阅
lsmod
)下的引用计数或值是 -1。该值也可以在/sys/module//refcnt
中找到。这是我发现对我有用的答案: https://askubuntu.com/a/521231
echo -e "黑名单内核模块\n" | sudo tee -a /etc/modprobe.d/blacklist.conf
然后你必须重新启动你的机器。重新编译模块的干净稳定版本。然后输入以下命令重新加载&覆盖失败的模块。
insmod kernel_module.ko
Fianlly,
rmmod kernel_module
The reference count or value under the Used by column (see
lsmod
) for my case was -1. This value can also be found at/sys/module/<kernel_module>/refcnt
.Here's the answer I found that worked for me here: https://askubuntu.com/a/521231
echo -e "blacklist kernel_module\n" | sudo tee -a /etc/modprobe.d/blacklist.conf
Then you have to reboot your machine. Re-compile a clean stable version of your module. Then enter the following command to reload & overwrite the failed module.
insmod kernel_module.ko
Fianlly,
rmmod kernel_module