如何识别Linux块设备的请求队列
我正在开发这个通过网络连接硬盘的驱动程序。有一个错误,如果我在计算机上启用两个或多个硬盘,则只有第一个硬盘会检查并识别分区。结果是,如果我在 hda 上有 1 个分区,在 hdb 上有 1 个分区,那么只要我连接 hda 就有一个可以挂载的分区。所以hda1一挂载就得到一个blkid xyz123。但是当我继续挂载 hdb1 时,它也会出现相同的 blkid,事实上,驱动程序是从 hda 读取它,而不是 hdb。
所以我想我找到了司机搞砸的地方。下面是一个调试输出,包括一个 dump_stack,我将其放在似乎访问错误设备的第一个位置。
这是代码部分:
/*basically, this is just the request_queue processor. In the log output that
follows, the second device, (hdb) has just been connected, right after hda
was connected and hda1 was mounted to the system. */
void nblk_request_proc(struct request_queue *q)
{
struct request *req;
ndas_error_t err = NDAS_OK;
dump_stack();
while((req = NBLK_NEXT_REQUEST(q)) != NULL)
{
dbgl_blk(8,"processing queue request from slot %d",SLOT_R(req));
if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags))) {
printk ("ndas: Queue is suspended\n");
/* Queue is suspended */
#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31) )
blk_start_request(req);
#else
blkdev_dequeue_request(req);
#endif
这是日志输出。我添加了一些注释来帮助理解正在发生的事情以及错误的调用似乎出现在哪里。
/* Just below here you can see "slot" mentioned many times. This is the
identification for the network case in which the hd is connected to the
network. So you will see slot 2 in this log because the first device has
already been connected and mounted. */
kernel: [231644.155503] BL|4|slot_enable|/driver/block/ctrldev.c:281|adding disk: slot=2, first_minor=16, capacity=976769072|nd/dpcd1,64:15:44.38,3828:10
kernel: [231644.155588] BL|3|ndop_open|/driver/block/ops.c:233|ing bdev=f6823400|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155598] BL|2|ndop_open|/driver/block/ops.c:247|slot =0x2|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155606] BL|2|ndop_open|/driver/block/ops.c:248|dev_t=0x3c00010|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155615] ND|3|ndas_query_slot|netdisk/nddev.c:791|slot=2 sdev=d33e2080|nd/dpcd1,64:15:44.38,3696:10
kernel: [231644.155624] ND|3|ndas_query_slot|netdisk/nddev.c:817|ed|nd/dpcd1,64:15:44.38,3696:10
kernel: [231644.155631] BL|3|ndop_open|/driver/block/ops.c:326|mode=1|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155640] BL|3|ndop_open|/driver/block/ops.c:365|ed open|nd/dpcd1,64:15:44.38,3724:10
kernel: [231644.155653] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2334|gendisk=c6afd800={major=60,first_minor=16,minors=0x10,disk_name=ndas-44700486-0,private_data=00000002,capacity=%lld}|nd/dpcd1,64:15:44.38,3660:10
kernel: [231644.155668] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2346|ed|nd/dpcd1,64:15:44.38,3652:10
/* So at this point the hard disk is added (gendisk=c6...) and the identifications
all match the network device. The driver is now about to begin scanning the
hard drive for existing partitions. the little 'ed', at the end of the previous
line indicates that revalidate_disk has finished it's job.
Also, I think the request queue is indicated by the output dpcd1 near the very
end of the line.
Now below we have entered the function that is pasted above. In the function
you can see that the slot can be determined by the queue. And the log output
after the stack dump shows it is from slot 1. (The first network drive that was
already mounted.) */
kernel: [231644.155677] ndas-44700486-0:Pid: 467, comm: nd/dpcd1 Tainted: P 2.6.32-5-686 #1
kernel: [231644.155711] Call Trace:
kernel: [231644.155723] [<fc5a7685>] ? nblk_request_proc+0x9/0x10c [ndas_block]
kernel: [231644.155732] [<c11298db>] ? __generic_unplug_device+0x23/0x25
kernel: [231644.155737] [<c1129afb>] ? generic_unplug_device+0x1e/0x2e
kernel: [231644.155743] [<c1123090>] ? blk_unplug+0x2e/0x31
kernel: [231644.155750] [<c10cceec>] ? block_sync_page+0x33/0x34
kernel: [231644.155756] [<c108770c>] ? sync_page+0x35/0x3d
kernel: [231644.155763] [<c126d568>] ? __wait_on_bit_lock+0x31/0x6a
kernel: [231644.155768] [<c10876d7>] ? sync_page+0x0/0x3d
kernel: [231644.155773] [<c10876aa>] ? __lock_page+0x76/0x7e
kernel: [231644.155780] [<c1043f1f>] ? wake_bit_function+0x0/0x3c
kernel: [231644.155785] [<c1087b76>] ? do_read_cache_page+0xdf/0xf8
kernel: [231644.155791] [<c10d21b9>] ? blkdev_readpage+0x0/0xc
kernel: [231644.155796] [<c1087bbc>] ? read_cache_page_async+0x14/0x18
kernel: [231644.155801] [<c1087bc9>] ? read_cache_page+0x9/0xf
kernel: [231644.155808] [<c10ed6fc>] ? read_dev_sector+0x26/0x60
kernel: [231644.155813] [<c10ee368>] ? adfspart_check_ICS+0x20/0x14c
kernel: [231644.155819] [<c10ee138>] ? rescan_partitions+0x17e/0x378
kernel: [231644.155825] [<c10ee348>] ? adfspart_check_ICS+0x0/0x14c
kernel: [231644.155830] [<c10d26a3>] ? __blkdev_get+0x225/0x2c7
kernel: [231644.155836] [<c10ed7e6>] ? register_disk+0xb0/0xfd
kernel: [231644.155843] [<c112e33b>] ? add_disk+0x9a/0xe8
kernel: [231644.155848] [<c112dafd>] ? exact_match+0x0/0x4
kernel: [231644.155853] [<c112deae>] ? exact_lock+0x0/0xd
kernel: [231644.155861] [<fc5a8b80>] ? slot_enable+0x405/0x4a5 [ndas_block]
kernel: [231644.155868] [<fc5a8c63>] ? ndcmd_enabled_handler+0x43/0x9e [ndas_block]
kernel: [231644.155874] [<fc5a8c20>] ? ndcmd_enabled_handler+0x0/0x9e [ndas_block]
kernel: [231644.155891] [<fc54b22b>] ? notify_func+0x38/0x4b [ndas_core]
kernel: [231644.155906] [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
kernel: [231644.155919] [<fc562005>] ? _dpc_cancel+0x4c7/0x626 [ndas_core]
kernel: [231644.155933] [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
kernel: [231644.155941] [<c1003d47>] ? kernel_thread_helper+0x7/0x10
/* here are the output of the driver debugs. They show that this operation is
being performed on the first devices request queue. */
kernel: [231644.155948] BL|8|nblk_request_proc|/driver/block/block26.c:494|processing queue request from slot 1|nd/dpcd1,64:15:44.38,3408:10
kernel: [231644.155959] BL|8|nblk_handle_io|/driver/block/block26.c:374|struct ndas_slot sd = NDAS GET SLOT DEV(slot 1)
kernel: [231644.155966] |nd/dpcd1,64:15:44.38,3328:10
kernel: [231644.155970] BL|8|nblk_handle_io|/driver/block/block26.c:458|case READA call ndas_read(slot=1, ndas_req)|nd/dpcd1,64:15:44.38,3328:10
kernel: [231644.155979] ND|8|ndas_read|netdisk/nddev.c:824|read io: slot=1, cmd=0, req=x00|nd/dpcd1,64:15:44.38,3320:10
我希望这是足够的背景信息。也许此时一个明显的问题是“request_queues 何时何地分配?”
嗯,这是在 add_disk 函数之前处理的。添加磁盘,是日志输出的第一行。
slot->disk = NULL;
spin_lock_init(&slot->lock);
slot->queue = blk_init_queue(
nblk_request_proc,
&slot->lock
);
据我所知,这是标准操作。回到我原来的问题。我能否在某处找到请求队列,并确保它对于每个新设备都是递增的或唯一的,或者 Linux 内核是否仅对每个主编号使用一个队列?我想了解为什么这个驱动程序在两个不同的块存储上加载相同的队列,并确定这是否导致在初始注册过程中出现重复的 blkid。
谢谢你帮我看看这个情况。
I am working on this driver that connects the hard disk over the network. There is a bug that if I enable two or more hard disks on the computer, only the first one gets the partitions looked over and identified. The result is, if I have 1 partition on hda and 1 partitions on hdb, as soon as I connect hda there is a partition that can be mounted. So hda1 gets a blkid xyz123 as soon as it mounts. But when I go ahead and mount hdb1 it also comes up with the same blkid and in fact, the driver is reading it from hda, not hdb.
So I think I found the place where the driver is messing up. Below is a debug output including a dump_stack which I put at the first spot where it seems to be accessing the wrong device.
Here is the code section:
/*basically, this is just the request_queue processor. In the log output that
follows, the second device, (hdb) has just been connected, right after hda
was connected and hda1 was mounted to the system. */
void nblk_request_proc(struct request_queue *q)
{
struct request *req;
ndas_error_t err = NDAS_OK;
dump_stack();
while((req = NBLK_NEXT_REQUEST(q)) != NULL)
{
dbgl_blk(8,"processing queue request from slot %d",SLOT_R(req));
if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags))) {
printk ("ndas: Queue is suspended\n");
/* Queue is suspended */
#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31) )
blk_start_request(req);
#else
blkdev_dequeue_request(req);
#endif
Here is a log output. I have added some comments to help understand what is happening and where the bad call seems to come up.
/* Just below here you can see "slot" mentioned many times. This is the
identification for the network case in which the hd is connected to the
network. So you will see slot 2 in this log because the first device has
already been connected and mounted. */
kernel: [231644.155503] BL|4|slot_enable|/driver/block/ctrldev.c:281|adding disk: slot=2, first_minor=16, capacity=976769072|nd/dpcd1,64:15:44.38,3828:10
kernel: [231644.155588] BL|3|ndop_open|/driver/block/ops.c:233|ing bdev=f6823400|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155598] BL|2|ndop_open|/driver/block/ops.c:247|slot =0x2|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155606] BL|2|ndop_open|/driver/block/ops.c:248|dev_t=0x3c00010|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155615] ND|3|ndas_query_slot|netdisk/nddev.c:791|slot=2 sdev=d33e2080|nd/dpcd1,64:15:44.38,3696:10
kernel: [231644.155624] ND|3|ndas_query_slot|netdisk/nddev.c:817|ed|nd/dpcd1,64:15:44.38,3696:10
kernel: [231644.155631] BL|3|ndop_open|/driver/block/ops.c:326|mode=1|nd/dpcd1,64:15:44.38,3720:10
kernel: [231644.155640] BL|3|ndop_open|/driver/block/ops.c:365|ed open|nd/dpcd1,64:15:44.38,3724:10
kernel: [231644.155653] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2334|gendisk=c6afd800={major=60,first_minor=16,minors=0x10,disk_name=ndas-44700486-0,private_data=00000002,capacity=%lld}|nd/dpcd1,64:15:44.38,3660:10
kernel: [231644.155668] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2346|ed|nd/dpcd1,64:15:44.38,3652:10
/* So at this point the hard disk is added (gendisk=c6...) and the identifications
all match the network device. The driver is now about to begin scanning the
hard drive for existing partitions. the little 'ed', at the end of the previous
line indicates that revalidate_disk has finished it's job.
Also, I think the request queue is indicated by the output dpcd1 near the very
end of the line.
Now below we have entered the function that is pasted above. In the function
you can see that the slot can be determined by the queue. And the log output
after the stack dump shows it is from slot 1. (The first network drive that was
already mounted.) */
kernel: [231644.155677] ndas-44700486-0:Pid: 467, comm: nd/dpcd1 Tainted: P 2.6.32-5-686 #1
kernel: [231644.155711] Call Trace:
kernel: [231644.155723] [<fc5a7685>] ? nblk_request_proc+0x9/0x10c [ndas_block]
kernel: [231644.155732] [<c11298db>] ? __generic_unplug_device+0x23/0x25
kernel: [231644.155737] [<c1129afb>] ? generic_unplug_device+0x1e/0x2e
kernel: [231644.155743] [<c1123090>] ? blk_unplug+0x2e/0x31
kernel: [231644.155750] [<c10cceec>] ? block_sync_page+0x33/0x34
kernel: [231644.155756] [<c108770c>] ? sync_page+0x35/0x3d
kernel: [231644.155763] [<c126d568>] ? __wait_on_bit_lock+0x31/0x6a
kernel: [231644.155768] [<c10876d7>] ? sync_page+0x0/0x3d
kernel: [231644.155773] [<c10876aa>] ? __lock_page+0x76/0x7e
kernel: [231644.155780] [<c1043f1f>] ? wake_bit_function+0x0/0x3c
kernel: [231644.155785] [<c1087b76>] ? do_read_cache_page+0xdf/0xf8
kernel: [231644.155791] [<c10d21b9>] ? blkdev_readpage+0x0/0xc
kernel: [231644.155796] [<c1087bbc>] ? read_cache_page_async+0x14/0x18
kernel: [231644.155801] [<c1087bc9>] ? read_cache_page+0x9/0xf
kernel: [231644.155808] [<c10ed6fc>] ? read_dev_sector+0x26/0x60
kernel: [231644.155813] [<c10ee368>] ? adfspart_check_ICS+0x20/0x14c
kernel: [231644.155819] [<c10ee138>] ? rescan_partitions+0x17e/0x378
kernel: [231644.155825] [<c10ee348>] ? adfspart_check_ICS+0x0/0x14c
kernel: [231644.155830] [<c10d26a3>] ? __blkdev_get+0x225/0x2c7
kernel: [231644.155836] [<c10ed7e6>] ? register_disk+0xb0/0xfd
kernel: [231644.155843] [<c112e33b>] ? add_disk+0x9a/0xe8
kernel: [231644.155848] [<c112dafd>] ? exact_match+0x0/0x4
kernel: [231644.155853] [<c112deae>] ? exact_lock+0x0/0xd
kernel: [231644.155861] [<fc5a8b80>] ? slot_enable+0x405/0x4a5 [ndas_block]
kernel: [231644.155868] [<fc5a8c63>] ? ndcmd_enabled_handler+0x43/0x9e [ndas_block]
kernel: [231644.155874] [<fc5a8c20>] ? ndcmd_enabled_handler+0x0/0x9e [ndas_block]
kernel: [231644.155891] [<fc54b22b>] ? notify_func+0x38/0x4b [ndas_core]
kernel: [231644.155906] [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
kernel: [231644.155919] [<fc562005>] ? _dpc_cancel+0x4c7/0x626 [ndas_core]
kernel: [231644.155933] [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
kernel: [231644.155941] [<c1003d47>] ? kernel_thread_helper+0x7/0x10
/* here are the output of the driver debugs. They show that this operation is
being performed on the first devices request queue. */
kernel: [231644.155948] BL|8|nblk_request_proc|/driver/block/block26.c:494|processing queue request from slot 1|nd/dpcd1,64:15:44.38,3408:10
kernel: [231644.155959] BL|8|nblk_handle_io|/driver/block/block26.c:374|struct ndas_slot sd = NDAS GET SLOT DEV(slot 1)
kernel: [231644.155966] |nd/dpcd1,64:15:44.38,3328:10
kernel: [231644.155970] BL|8|nblk_handle_io|/driver/block/block26.c:458|case READA call ndas_read(slot=1, ndas_req)|nd/dpcd1,64:15:44.38,3328:10
kernel: [231644.155979] ND|8|ndas_read|netdisk/nddev.c:824|read io: slot=1, cmd=0, req=x00|nd/dpcd1,64:15:44.38,3320:10
I hope this is enough background information. Maybe an obvious question at this moment is "When and where are the request_queues assigned?"
Well that is handled a little bit before the add_disk function. adding disk, is the first line on the log output.
slot->disk = NULL;
spin_lock_init(&slot->lock);
slot->queue = blk_init_queue(
nblk_request_proc,
&slot->lock
);
As far as I know, this is the standard operation. So back to my original question. Can I find the request queue somewhere and make sure it is incremented or unique for each new device or does the Linux kernel only use one queue for each Major number? I want to discover why this driver is loading the same queue on two different block storages, and determine if that is causing the duplicate blkid during the initial registration process.
Thanks for looking at this situation for me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我分享导致我发布此问题的错误的解决方案。虽然它实际上并没有回答如何识别设备请求队列的问题。
上面的代码如下:
好吧,“SLOT_R(req)”造成了麻烦。这是在返回 gendisk 设备的其他位置定义的。
这返回了磁盘,但不是稍后各种操作的正确值。因此,随着额外的块设备被加载,这个函数基本上一直返回 1。(我认为它正在作为布尔值进行处理。)因此,所有请求都堆积在磁盘 1 的请求队列上。
解决方法是访问正确的当磁盘添加到系统时,磁盘标识值已存储在磁盘的 private_data 中。
现在,正确的磁盘 ID 是从私有数据返回的,因此所有请求都发送到正确的队列。
如前所述,这并没有显示如何识别队列。一个未经教育的猜测可能是:
参考文献。 Linux中的request_queue函数
http://lxr.free-electrons.com/source/include /linux/blkdev.h#L270 &第321章
谢谢大家的帮助!
I share the solution to the bug that led me to post this question. Though it does not actually answer the question of how to identify the device request queue.
In the code above is the following:
Well, that "SLOT_R(req)" was causing the trouble. That is defined else where to return the gendisk device.
This returned the disk, but not the proper value for various operations later on. So as the extra block devices were loaded, this function basically kept returning 1. (I think it was processing as a boolean.) Therefore, all requests were piled on the to the request queue for disk 1.
The fix was to access the correct disk identification value that was already stored in the disk's private_data when the it was added to the system.
Now the correct disk id is returned from private data, so all requests to the correct queue.
As mentioned, this does not show how to identify the queue though. An un-educated guess might be:
Ref. the request_queue function in linux
http://lxr.free-electrons.com/source/include/linux/blkdev.h#L270 & 321
Thanks everyone for helping!