调用 rdma_disconnect() 后同步或异步返回已发布的接收缓冲区
调用 rdma_disconnect() 时,我是否会在 rdma_disconnect() 返回之前获取所有已发布的接收工作请求的完成队列事件,或者我应该期望它们在 rdma_disconnect() 返回后进入?
When calling rdma_disconnect(), do I get completion queue events for all posted recv work requests before rdma_disconnect() returns, or should I expect them to come in after rdma_disconnect() has returned?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 rdma_disconnect() 返回后,接收将(可能)异步完成(带有“刷新错误”状态)。正如您从 rdma_disconnect() 的源代码中看到的,它所做的就是将 QP 转换到错误状态并向另一端发送断开连接请求。
将 QP 转换到错误状态确实可以保证发布到 QP 的所有待处理工作请求都将以错误状态完成,但修改 QP 操作会立即返回,而无需等待队列耗尽。类似地,rdma_disconnect() 不会等待所有挂起的工作请求完成——事实上,很难看出它是如何做到的,因为 RDMA CM 实际上没有任何方法知道有多少工作请求在排队,让单独查看相关的 CQ 以查看它们何时全部完成。
如果您想了解在转换到错误状态时正在进行的请求等的极端情况,那么 IB 规范第 1 卷第 10 章详细介绍了工作请求处理。
The receives will complete (with a "flush error" status) asynchronously (possibly) after rdma_disconnect() returns. As you can see from the source for rdma_disconnect(), all that it does is transition the QP to the error state and send a disconnect request to the other side.
Transitioning a QP to the error state does guarantee that all pending work requests posted to the QP will be completed with error status, but the modify QP operation returns immediately without waiting for the queues to drain. Similarly rdma_disconnect() doesn't wait for all pending work requests to finish -- in fact it would be hard to see how it could, since the RDMA CM doesn't really have any way to know how many work requests are queued, let alone peek into the associated CQ to see when they all complete.
Chapter 10 of volume 1 of the IB spec goes into great detail about work request processing if you are wondering about the corner cases about requests that are in flight at the time of the transition to error state, etc.