如何识别Linux块设备的请求队列

发布于 2024-11-25 13:19:53 字数 7093 浏览 2 评论 0原文

我正在开发这个通过网络连接硬盘的驱动程序。有一个错误,如果我在计算机上启用两个或多个硬盘,则只有第一个硬盘会检查并识别分区。结果是,如果我在 hda 上有 1 个分区,在 hdb 上有 1 个分区,那么只要我连接 hda 就有一个可以挂载的分区。所以hda1一挂载就得到一个blkid xyz123。但是当我继续挂载 hdb1 时,它也会出现相同的 blkid,事实上,驱动程序是从 hda 读取它,而不是 hdb。

所以我想我找到了司机搞砸的地方。下面是一个调试输出,包括一个 dump_stack,我将其放在似乎访问错误设备的第一个位置。

这是代码部分:

/*basically, this is just the request_queue processor. In the log output that
  follows, the second device, (hdb) has just been connected, right after hda
  was connected and hda1 was mounted to the system. */

void nblk_request_proc(struct request_queue *q)
{
struct request *req;
ndas_error_t err = NDAS_OK;

dump_stack();

while((req = NBLK_NEXT_REQUEST(q)) != NULL)
{
    dbgl_blk(8,"processing queue request from slot %d",SLOT_R(req));

    if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags)))  {
        printk ("ndas: Queue is suspended\n");
        /* Queue is suspended */
#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31) )
        blk_start_request(req);
#else
        blkdev_dequeue_request(req);
#endif

这是日志输出。我添加了一些注释来帮助理解正在发生的事情以及错误的调用似乎出现在哪里。

  /* Just below here you can see "slot" mentioned many times. This is the 
     identification for the network case in which the hd is connected to the 
     network. So you will see slot 2 in this log because the first device has 
     already been connected and mounted. */

  kernel: [231644.155503] BL|4|slot_enable|/driver/block/ctrldev.c:281|adding disk: slot=2, first_minor=16, capacity=976769072|nd/dpcd1,64:15:44.38,3828:10
  kernel: [231644.155588] BL|3|ndop_open|/driver/block/ops.c:233|ing bdev=f6823400|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155598] BL|2|ndop_open|/driver/block/ops.c:247|slot =0x2|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155606] BL|2|ndop_open|/driver/block/ops.c:248|dev_t=0x3c00010|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155615] ND|3|ndas_query_slot|netdisk/nddev.c:791|slot=2 sdev=d33e2080|nd/dpcd1,64:15:44.38,3696:10
  kernel: [231644.155624] ND|3|ndas_query_slot|netdisk/nddev.c:817|ed|nd/dpcd1,64:15:44.38,3696:10
  kernel: [231644.155631] BL|3|ndop_open|/driver/block/ops.c:326|mode=1|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155640] BL|3|ndop_open|/driver/block/ops.c:365|ed open|nd/dpcd1,64:15:44.38,3724:10
  kernel: [231644.155653] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2334|gendisk=c6afd800={major=60,first_minor=16,minors=0x10,disk_name=ndas-44700486-0,private_data=00000002,capacity=%lld}|nd/dpcd1,64:15:44.38,3660:10
  kernel: [231644.155668] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2346|ed|nd/dpcd1,64:15:44.38,3652:10

  /* So at this point the hard disk is added (gendisk=c6...) and the identifications
     all match the network device. The driver is now about to begin scanning the 
     hard drive for existing partitions. the little 'ed', at the end of the previous
     line indicates that revalidate_disk has finished it's job. 

     Also, I think the request queue is indicated by the output dpcd1 near the very
     end of the line. 

     Now below we have entered the function that is pasted above. In the function
     you can see that the slot can be determined by the queue. And the log output
     after the stack dump shows it is from slot 1. (The first network drive that was
     already mounted.) */

        kernel: [231644.155677]  ndas-44700486-0:Pid: 467, comm: nd/dpcd1 Tainted: P           2.6.32-5-686 #1
  kernel: [231644.155711] Call Trace:
  kernel: [231644.155723]  [<fc5a7685>] ? nblk_request_proc+0x9/0x10c [ndas_block]
  kernel: [231644.155732]  [<c11298db>] ? __generic_unplug_device+0x23/0x25
  kernel: [231644.155737]  [<c1129afb>] ? generic_unplug_device+0x1e/0x2e
  kernel: [231644.155743]  [<c1123090>] ? blk_unplug+0x2e/0x31
  kernel: [231644.155750]  [<c10cceec>] ? block_sync_page+0x33/0x34
  kernel: [231644.155756]  [<c108770c>] ? sync_page+0x35/0x3d
  kernel: [231644.155763]  [<c126d568>] ? __wait_on_bit_lock+0x31/0x6a
  kernel: [231644.155768]  [<c10876d7>] ? sync_page+0x0/0x3d
  kernel: [231644.155773]  [<c10876aa>] ? __lock_page+0x76/0x7e
  kernel: [231644.155780]  [<c1043f1f>] ? wake_bit_function+0x0/0x3c
  kernel: [231644.155785]  [<c1087b76>] ? do_read_cache_page+0xdf/0xf8
  kernel: [231644.155791]  [<c10d21b9>] ? blkdev_readpage+0x0/0xc
  kernel: [231644.155796]  [<c1087bbc>] ? read_cache_page_async+0x14/0x18
  kernel: [231644.155801]  [<c1087bc9>] ? read_cache_page+0x9/0xf
  kernel: [231644.155808]  [<c10ed6fc>] ? read_dev_sector+0x26/0x60
  kernel: [231644.155813]  [<c10ee368>] ? adfspart_check_ICS+0x20/0x14c
  kernel: [231644.155819]  [<c10ee138>] ? rescan_partitions+0x17e/0x378
  kernel: [231644.155825]  [<c10ee348>] ? adfspart_check_ICS+0x0/0x14c
  kernel: [231644.155830]  [<c10d26a3>] ? __blkdev_get+0x225/0x2c7
  kernel: [231644.155836]  [<c10ed7e6>] ? register_disk+0xb0/0xfd
  kernel: [231644.155843]  [<c112e33b>] ? add_disk+0x9a/0xe8
  kernel: [231644.155848]  [<c112dafd>] ? exact_match+0x0/0x4
  kernel: [231644.155853]  [<c112deae>] ? exact_lock+0x0/0xd
  kernel: [231644.155861]  [<fc5a8b80>] ? slot_enable+0x405/0x4a5 [ndas_block]
  kernel: [231644.155868]  [<fc5a8c63>] ? ndcmd_enabled_handler+0x43/0x9e [ndas_block]
  kernel: [231644.155874]  [<fc5a8c20>] ? ndcmd_enabled_handler+0x0/0x9e [ndas_block]
  kernel: [231644.155891]  [<fc54b22b>] ? notify_func+0x38/0x4b [ndas_core]
  kernel: [231644.155906]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
  kernel: [231644.155919]  [<fc562005>] ? _dpc_cancel+0x4c7/0x626 [ndas_core]
  kernel: [231644.155933]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
  kernel: [231644.155941]  [<c1003d47>] ? kernel_thread_helper+0x7/0x10

  /* here are the output of the driver debugs. They show that this operation is
     being performed on the first devices request queue. */

  kernel: [231644.155948] BL|8|nblk_request_proc|/driver/block/block26.c:494|processing queue request from slot 1|nd/dpcd1,64:15:44.38,3408:10
  kernel: [231644.155959] BL|8|nblk_handle_io|/driver/block/block26.c:374|struct ndas_slot sd = NDAS GET SLOT DEV(slot 1)
  kernel: [231644.155966] |nd/dpcd1,64:15:44.38,3328:10
  kernel: [231644.155970] BL|8|nblk_handle_io|/driver/block/block26.c:458|case READA call ndas_read(slot=1, ndas_req)|nd/dpcd1,64:15:44.38,3328:10
  kernel: [231644.155979] ND|8|ndas_read|netdisk/nddev.c:824|read io: slot=1, cmd=0, req=x00|nd/dpcd1,64:15:44.38,3320:10

我希望这是足够的背景信息。也许此时一个明显的问题是“request_queues 何时何地分配?”

嗯,这是在 add_disk 函数之前处理的。添加磁盘,是日志输出的第一行。

slot->disk = NULL;
spin_lock_init(&slot->lock);
slot->queue = blk_init_queue(
    nblk_request_proc, 
    &slot->lock
);

据我所知,这是标准操作。回到我原来的问题。我能否在某处找到请求队列,并确保它对于每个新设备都是递增的或唯一的,或者 Linux 内核是否仅对每个主编号使用一个队列?我想了解为什么这个驱动程序在两个不同的块存储上加载相同的队列,并确定这是否导致在初始注册过程中出现重复的 blkid。

谢谢你帮我看看这个情况。

I am working on this driver that connects the hard disk over the network. There is a bug that if I enable two or more hard disks on the computer, only the first one gets the partitions looked over and identified. The result is, if I have 1 partition on hda and 1 partitions on hdb, as soon as I connect hda there is a partition that can be mounted. So hda1 gets a blkid xyz123 as soon as it mounts. But when I go ahead and mount hdb1 it also comes up with the same blkid and in fact, the driver is reading it from hda, not hdb.

So I think I found the place where the driver is messing up. Below is a debug output including a dump_stack which I put at the first spot where it seems to be accessing the wrong device.

Here is the code section:

/*basically, this is just the request_queue processor. In the log output that
  follows, the second device, (hdb) has just been connected, right after hda
  was connected and hda1 was mounted to the system. */

void nblk_request_proc(struct request_queue *q)
{
struct request *req;
ndas_error_t err = NDAS_OK;

dump_stack();

while((req = NBLK_NEXT_REQUEST(q)) != NULL)
{
    dbgl_blk(8,"processing queue request from slot %d",SLOT_R(req));

    if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags)))  {
        printk ("ndas: Queue is suspended\n");
        /* Queue is suspended */
#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31) )
        blk_start_request(req);
#else
        blkdev_dequeue_request(req);
#endif

Here is a log output. I have added some comments to help understand what is happening and where the bad call seems to come up.

  /* Just below here you can see "slot" mentioned many times. This is the 
     identification for the network case in which the hd is connected to the 
     network. So you will see slot 2 in this log because the first device has 
     already been connected and mounted. */

  kernel: [231644.155503] BL|4|slot_enable|/driver/block/ctrldev.c:281|adding disk: slot=2, first_minor=16, capacity=976769072|nd/dpcd1,64:15:44.38,3828:10
  kernel: [231644.155588] BL|3|ndop_open|/driver/block/ops.c:233|ing bdev=f6823400|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155598] BL|2|ndop_open|/driver/block/ops.c:247|slot =0x2|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155606] BL|2|ndop_open|/driver/block/ops.c:248|dev_t=0x3c00010|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155615] ND|3|ndas_query_slot|netdisk/nddev.c:791|slot=2 sdev=d33e2080|nd/dpcd1,64:15:44.38,3696:10
  kernel: [231644.155624] ND|3|ndas_query_slot|netdisk/nddev.c:817|ed|nd/dpcd1,64:15:44.38,3696:10
  kernel: [231644.155631] BL|3|ndop_open|/driver/block/ops.c:326|mode=1|nd/dpcd1,64:15:44.38,3720:10
  kernel: [231644.155640] BL|3|ndop_open|/driver/block/ops.c:365|ed open|nd/dpcd1,64:15:44.38,3724:10
  kernel: [231644.155653] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2334|gendisk=c6afd800={major=60,first_minor=16,minors=0x10,disk_name=ndas-44700486-0,private_data=00000002,capacity=%lld}|nd/dpcd1,64:15:44.38,3660:10
  kernel: [231644.155668] BL|8|ndop_revalidate_disk|/driver/block/ops.c:2346|ed|nd/dpcd1,64:15:44.38,3652:10

  /* So at this point the hard disk is added (gendisk=c6...) and the identifications
     all match the network device. The driver is now about to begin scanning the 
     hard drive for existing partitions. the little 'ed', at the end of the previous
     line indicates that revalidate_disk has finished it's job. 

     Also, I think the request queue is indicated by the output dpcd1 near the very
     end of the line. 

     Now below we have entered the function that is pasted above. In the function
     you can see that the slot can be determined by the queue. And the log output
     after the stack dump shows it is from slot 1. (The first network drive that was
     already mounted.) */

        kernel: [231644.155677]  ndas-44700486-0:Pid: 467, comm: nd/dpcd1 Tainted: P           2.6.32-5-686 #1
  kernel: [231644.155711] Call Trace:
  kernel: [231644.155723]  [<fc5a7685>] ? nblk_request_proc+0x9/0x10c [ndas_block]
  kernel: [231644.155732]  [<c11298db>] ? __generic_unplug_device+0x23/0x25
  kernel: [231644.155737]  [<c1129afb>] ? generic_unplug_device+0x1e/0x2e
  kernel: [231644.155743]  [<c1123090>] ? blk_unplug+0x2e/0x31
  kernel: [231644.155750]  [<c10cceec>] ? block_sync_page+0x33/0x34
  kernel: [231644.155756]  [<c108770c>] ? sync_page+0x35/0x3d
  kernel: [231644.155763]  [<c126d568>] ? __wait_on_bit_lock+0x31/0x6a
  kernel: [231644.155768]  [<c10876d7>] ? sync_page+0x0/0x3d
  kernel: [231644.155773]  [<c10876aa>] ? __lock_page+0x76/0x7e
  kernel: [231644.155780]  [<c1043f1f>] ? wake_bit_function+0x0/0x3c
  kernel: [231644.155785]  [<c1087b76>] ? do_read_cache_page+0xdf/0xf8
  kernel: [231644.155791]  [<c10d21b9>] ? blkdev_readpage+0x0/0xc
  kernel: [231644.155796]  [<c1087bbc>] ? read_cache_page_async+0x14/0x18
  kernel: [231644.155801]  [<c1087bc9>] ? read_cache_page+0x9/0xf
  kernel: [231644.155808]  [<c10ed6fc>] ? read_dev_sector+0x26/0x60
  kernel: [231644.155813]  [<c10ee368>] ? adfspart_check_ICS+0x20/0x14c
  kernel: [231644.155819]  [<c10ee138>] ? rescan_partitions+0x17e/0x378
  kernel: [231644.155825]  [<c10ee348>] ? adfspart_check_ICS+0x0/0x14c
  kernel: [231644.155830]  [<c10d26a3>] ? __blkdev_get+0x225/0x2c7
  kernel: [231644.155836]  [<c10ed7e6>] ? register_disk+0xb0/0xfd
  kernel: [231644.155843]  [<c112e33b>] ? add_disk+0x9a/0xe8
  kernel: [231644.155848]  [<c112dafd>] ? exact_match+0x0/0x4
  kernel: [231644.155853]  [<c112deae>] ? exact_lock+0x0/0xd
  kernel: [231644.155861]  [<fc5a8b80>] ? slot_enable+0x405/0x4a5 [ndas_block]
  kernel: [231644.155868]  [<fc5a8c63>] ? ndcmd_enabled_handler+0x43/0x9e [ndas_block]
  kernel: [231644.155874]  [<fc5a8c20>] ? ndcmd_enabled_handler+0x0/0x9e [ndas_block]
  kernel: [231644.155891]  [<fc54b22b>] ? notify_func+0x38/0x4b [ndas_core]
  kernel: [231644.155906]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
  kernel: [231644.155919]  [<fc562005>] ? _dpc_cancel+0x4c7/0x626 [ndas_core]
  kernel: [231644.155933]  [<fc561cba>] ? _dpc_cancel+0x17c/0x626 [ndas_core]
  kernel: [231644.155941]  [<c1003d47>] ? kernel_thread_helper+0x7/0x10

  /* here are the output of the driver debugs. They show that this operation is
     being performed on the first devices request queue. */

  kernel: [231644.155948] BL|8|nblk_request_proc|/driver/block/block26.c:494|processing queue request from slot 1|nd/dpcd1,64:15:44.38,3408:10
  kernel: [231644.155959] BL|8|nblk_handle_io|/driver/block/block26.c:374|struct ndas_slot sd = NDAS GET SLOT DEV(slot 1)
  kernel: [231644.155966] |nd/dpcd1,64:15:44.38,3328:10
  kernel: [231644.155970] BL|8|nblk_handle_io|/driver/block/block26.c:458|case READA call ndas_read(slot=1, ndas_req)|nd/dpcd1,64:15:44.38,3328:10
  kernel: [231644.155979] ND|8|ndas_read|netdisk/nddev.c:824|read io: slot=1, cmd=0, req=x00|nd/dpcd1,64:15:44.38,3320:10

I hope this is enough background information. Maybe an obvious question at this moment is "When and where are the request_queues assigned?"

Well that is handled a little bit before the add_disk function. adding disk, is the first line on the log output.

slot->disk = NULL;
spin_lock_init(&slot->lock);
slot->queue = blk_init_queue(
    nblk_request_proc, 
    &slot->lock
);

As far as I know, this is the standard operation. So back to my original question. Can I find the request queue somewhere and make sure it is incremented or unique for each new device or does the Linux kernel only use one queue for each Major number? I want to discover why this driver is loading the same queue on two different block storages, and determine if that is causing the duplicate blkid during the initial registration process.

Thanks for looking at this situation for me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

执笏见 2024-12-02 13:19:53
Queue = blk_init_queue(sbd_request, &Device.lock);
Queue = blk_init_queue(sbd_request, &Device.lock);
别闹i 2024-12-02 13:19:53

我分享导致我发布此问题的错误的解决方案。虽然它实际上并没有回答如何识别设备请求队列的问题。

上面的代码如下:

if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, 
       &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags))) 

好吧,“SLOT_R(req)”造成了麻烦。这是在返回 gendisk 设备的其他位置定义的。

#define SLOT_R(_request_) SLOT((_request_)->rq_disk)

这返回了磁盘,但不是稍后各种操作的正确值。因此,随着额外的块设备被加载,这个函数基本上一直返回 1。(我认为它正在作为布尔值进行处理。)因此,所有请求都堆积在磁盘 1 的请求队列上。

解决方法是访问正确的当磁盘添加到系统时,磁盘标识值已存储在磁盘的 private_data 中。

Correct identifier definition:
   #define SLOT_R(_request_) ( (int) _request_->rq_disk->private_data )

How the correct disk number was stored.
   slot->disk->queue = slot->queue;
   slot->disk->private_data = (void*) (long) s;  <-- 's' is the disk id
   slot->queue_flags = 0;

现在,正确的磁盘 ID 是从私有数据返回的,因此所有请求都发送到正确的队列。

如前所述,这并没有显示如何识别队列。一个未经教育的猜测可能是:

 x = (int) _request_->rq_disk->queue->id;

参考文献。 Linux中的request_queue函数
http://lxr.free-electrons.com/source/include /linux/blkdev.h#L270 &第321章

谢谢大家的帮助!

I share the solution to the bug that led me to post this question. Though it does not actually answer the question of how to identify the device request queue.

In the code above is the following:

if (test_bit(NDAS_FLAG_QUEUE_SUSPENDED, 
       &(NDAS_GET_SLOT_DEV(SLOT_R(req))->queue_flags))) 

Well, that "SLOT_R(req)" was causing the trouble. That is defined else where to return the gendisk device.

#define SLOT_R(_request_) SLOT((_request_)->rq_disk)

This returned the disk, but not the proper value for various operations later on. So as the extra block devices were loaded, this function basically kept returning 1. (I think it was processing as a boolean.) Therefore, all requests were piled on the to the request queue for disk 1.

The fix was to access the correct disk identification value that was already stored in the disk's private_data when the it was added to the system.

Correct identifier definition:
   #define SLOT_R(_request_) ( (int) _request_->rq_disk->private_data )

How the correct disk number was stored.
   slot->disk->queue = slot->queue;
   slot->disk->private_data = (void*) (long) s;  <-- 's' is the disk id
   slot->queue_flags = 0;

Now the correct disk id is returned from private data, so all requests to the correct queue.

As mentioned, this does not show how to identify the queue though. An un-educated guess might be:

 x = (int) _request_->rq_disk->queue->id;

Ref. the request_queue function in linux
http://lxr.free-electrons.com/source/include/linux/blkdev.h#L270 & 321

Thanks everyone for helping!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文