Linux 内核模块中 printk 的奇怪行为
我正在为 Linux 内核模块编写代码,并遇到了奇怪的行为。 这是我的代码:
int data = 0;
void threadfn1()
{
int j;
for( j = 0; j < 10; j++ )
printk(KERN_INFO "I AM THREAD 1 %d\n",j);
data++;
}
void threadfn2()
{
int j;
for( j = 0; j < 10; j++ )
printk(KERN_INFO "I AM THREAD 2 %d\n",j);
data++;
}
static int __init abc_init(void)
{
struct task_struct *t1 = kthread_run(threadfn1, NULL, "thread1");
struct task_struct *t2 = kthread_run(threadfn2, NULL, "thread2");
while( 1 )
{
printk("debug\n"); // runs ok
if( data >= 2 )
{
kthread_stop(t1);
kthread_stop(t2);
break;
}
}
printk(KERN_INFO "HELLO WORLD\n");
}
基本上我试图等待线程完成,然后打印一些东西。 上面的代码确实实现了该目标,但没有注释 "printk("debug\n");"
。一旦我注释掉 printk("debug\n");
以在不调试的情况下运行代码并通过 insmod 命令加载模块,该模块就会挂起,并且似乎在递归中丢失了。我不明白为什么 printk 对我的代码影响这么大?
任何帮助将不胜感激。
问候。
I am writing a code for linux kernel module and experiencing a strange behavior in it.
Here is my code:
int data = 0;
void threadfn1()
{
int j;
for( j = 0; j < 10; j++ )
printk(KERN_INFO "I AM THREAD 1 %d\n",j);
data++;
}
void threadfn2()
{
int j;
for( j = 0; j < 10; j++ )
printk(KERN_INFO "I AM THREAD 2 %d\n",j);
data++;
}
static int __init abc_init(void)
{
struct task_struct *t1 = kthread_run(threadfn1, NULL, "thread1");
struct task_struct *t2 = kthread_run(threadfn2, NULL, "thread2");
while( 1 )
{
printk("debug\n"); // runs ok
if( data >= 2 )
{
kthread_stop(t1);
kthread_stop(t2);
break;
}
}
printk(KERN_INFO "HELLO WORLD\n");
}
Basically I was trying to wait for threads to finish and then print something after that.
The above code does achieve that target but WITH "printk("debug\n");"
not commented. As soon as I comment out printk("debug\n");
to run the code without debugging and load the module through insmod command, the module hangs on and it seems like it gets lost in recursion. I dont why printk effects my code in such a big way?
Any help would be appreciated.
regards.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您没有同步对数据变量的访问。发生的情况是,编译器将生成无限循环。原因如下:
编译器可以检测到 data 的值在 while 循环内永远不会改变。因此,它可以完全将检查移出循环,并且最终会得到一个简单的结果。
如果插入 printk,编译器必须假设全局变量数据可能会更改(毕竟 - 编译器不知道 printk 详细执行什么操作) )因此您的代码将再次开始工作(以未定义的行为方式......)
如何解决此问题:
使用正确的线程同步原语。如果将对数据的访问包装到受互斥锁保护的代码部分中,则代码将起作用。您还可以替换变量数据并使用计数信号量。
编辑:
此链接解释了 linux 内核中的锁定如何工作:
http://www.linuxgrill.com/anonymous/fire/netfilter/kernel-hacking-HOWTO-5.html
You're not synchronizing the access to the data-variable. What happens is, that the compiler will generate a infinite loop. Here is why:
The compiler can detect that the value of data never changes within the while loop. Therefore it can completely move the check out of the loop and you'll end up with a simple
If you insert printk the compiler has to assume that the global variable data may change (after all - the compiler has no idea what printk does in detail) therefore your code will start to work again (in a undefined behavior kind of way..)
How to fix this:
Use proper thread synchronization primitives. If you wrap the access to data into a code section protected by a mutex the code will work. You could also replace the variable data and use a counted semaphore instead.
Edit:
This link explains how locking in the linux-kernel works:
http://www.linuxgrill.com/anonymous/fire/netfilter/kernel-hacking-HOWTO-5.html
删除对
printk()
的调用后,编译器将循环优化为while (1);
。当您添加对printk()
的调用时,编译器不确定data
是否未更改,因此每次循环时都会检查该值。您可以在循环中插入一个屏障,这会强制编译器在每次迭代时重新评估
data
。例如:With the call to
printk()
removed the compiler is optimising the loop intowhile (1);
. When you add the call toprintk()
the compiler is not sure thatdata
isn't changed and so checks the value each time through the loop.You can insert a barrier into the loop, which forces the compiler to reevaluate
data
on each iteration. eg:也许数据应该声明为易失性的?编译器可能不会在循环中访问内存来获取数据。
Maybe data should be declared volatile? It could be that the compiler is not going to memory to get data in the loop.
尼尔斯·皮彭布林克(Nils Pipenbrinck)的回答是正确的。我只是添加一些指示。
Rusty 的不可靠内核锁定指南 (每个内核黑客都应该阅读这篇文章)。
再见信号量?,互斥量 API(lwn.net 文章介绍了新的互斥量 API 2006 年初,在此之前 Linux 内核使用信号量作为互斥体)。
另外,由于您的共享数据是一个简单的计数器,因此您只需使用原子 API(基本上,将计数器声明为atomic_t 并使用atomic_* 函数访问它)。
Nils Pipenbrinck's answer is spot on. I'll just add some pointers.
Rusty's Unreliable Guide to Kernel Locking (every kernel hacker should read this one).
Goodbye semaphores?, The mutex API (lwn.net articles on the new mutex API introduced in early 2006, before that the Linux kernel used semaphores as mutexes).
Also, since your shared data is a simple counter, you can just use the atomic API (basically, declare your counter as atomic_t and access it using atomic_* functions).
波动性并不总是“坏主意”。一个人需要分开
需要 volatility 和互斥时的情况
需要机制。当使用或误用时,它不是最佳的
一种机制适用于另一种机制。在上述情况下。我建议
为了获得最佳解决方案,需要两种机制:互斥
提供互斥,易失性以向编译器表明
“信息”必须从硬件中新鲜读取。否则,在某些
情况(优化-O2、-O3),编译器可能会无意中
省略所需的代码。
Volatile might not always be "bad idea". One needs to separate out
the case of when volatile is needed and when mutual exclusion
mechanism is needed. It is non optimal when one uses or misuses
one mechanism for the other. In the above case. I would suggest
for optimal solution, that both mechanisms are needed: mutex to
provide mutual exclusion, volatile to indicate to compiler that
"info" must be read fresh from hardware. Otherwise, in some
situation (optimization -O2, -O3), compilers might inadvertently
leave out the needed codes.