rte_eth_tx_burst突然停止发送数据包

发布于 2025-02-04 21:40:34 字数 4752 浏览 4 评论 0原文

我正在使用DPDK 21.11进行申请。一段时间后，API RTE_ETH_TX_BUBST停止将任何数据包发送出去。

10GBE SFP+ 1572的以太网控制器X710 drv = vfio-pci

MAX_RETRY_COUNT_RTE_ETH_TX_BURST 3


 do
            {
                num_sent_pkt = rte_eth_tx_burst(eth_port_id, queue_id, &mbuf[mbuf_idx], pkt_count);
                pkt_count -= num_sent_pkt;
                retry_count++;
            } while(pkt_count && (retry_count != MAX_RETRY_COUNT_RTE_ETH_TX_BURST));

要调试，我尝试使用遥测来打印XSTATS。但是，我看不到任何错误。

--> /ethdev/xstats,1
{"/ethdev/xstats": {"rx_good_packets": 97727, "tx_good_packets": 157902622, "rx_good_bytes": 6459916, "tx_good_bytes": 229590348448, "rx_missed_errors": 0, "rx_errors": 0, "tx_errors": 0, "rx_mbuf_allocation_errors": 0, "rx_unicast_packets": 95827, "rx_multicast_packets": 1901, "rx_broadcast_packets": 0, "rx_dropped_packets": 0, "rx_unknown_protocol_packets": 97728, "rx_size_error_packets": 0, "tx_unicast_packets": 157902621, "tx_multicast_packets": 0, "tx_broadcast_packets": 1, "tx_dropped_packets": 0, "tx_link_down_dropped": 0, "rx_crc_errors": 0, "rx_illegal_byte_errors": 0, "rx_error_bytes": 0, "mac_local_errors": 0, "mac_remote_errors": 0, "rx_length_errors": 0, "tx_xon_packets": 0, "rx_xon_packets": 0, "tx_xoff_packets": 0, "rx_xoff_packets": 0, "rx_size_64_packets": 967, "rx_size_65_to_127_packets": 96697, "rx_size_128_to_255_packets": 0, "rx_size_256_to_511_packets": 64, "rx_size_512_to_1023_packets": 0, "rx_size_1024_to_1522_packets": 0, "rx_size_1523_to_max_packets": 0, "rx_undersized_errors": 0, "rx_oversize_errors": 0, "rx_mac_short_dropped": 0, "rx_fragmented_errors": 0, "rx_jabber_errors": 0, "tx_size_64_packets": 0, "tx_size_65_to_127_packets": 46, "tx_size_128_to_255_packets": 0, "tx_size_256_to_511_packets": 0, "tx_size_512_to_1023_packets": 0, "tx_size_1024_to_1522_packets": 157902576, "tx_size_1523_to_max_packets": 0, "rx_flow_director_atr_match_packets": 0, "rx_flow_director_sb_match_packets": 13, "tx_low_power_idle_status": 0, "rx_low_power_idle_status": 0, "tx_low_power_idle_count": 0, "rx_low_power_idle_count": 0, "rx_priority0_xon_packets": 0, "rx_priority1_xon_packets": 0, "rx_priority2_xon_packets": 0, "rx_priority3_xon_packets": 0, "rx_priority4_xon_packets": 0, "rx_priority5_xon_packets": 0, "rx_priority6_xon_packets": 0, "rx_priority7_xon_packets": 0, "rx_priority0_xoff_packets": 0, "rx_priority1_xoff_packets": 0, "rx_priority2_xoff_packets": 0, "rx_priority3_xoff_packets": 0, "rx_priority4_xoff_packets": 0, "rx_priority5_xoff_packets": 0, "rx_priority6_xoff_packets": 0, "rx_priority7_xoff_packets": 0, "tx_priority0_xon_packets": 0, "tx_priority1_xon_packets": 0, "tx_priority2_xon_packets": 0, "tx_priority3_xon_packets": 0, "tx_priority4_xon_packets": 0, "tx_priority5_xon_packets": 0, "tx_priority6_xon_packets": 0, "tx_priority7_xon_packets": 0, "tx_priority0_xoff_packets": 0, "tx_priority1_xoff_packets": 0, "tx_priority2_xoff_packets": 0, "tx_priority3_xoff_packets": 0, "tx_priority4_xoff_packets": 0, "tx_priority5_xoff_packets": 0, "tx_priority6_xoff_packets": 0, "tx_priority7_xoff_packets": 0, "tx_priority0_xon_to_xoff_packets": 0, "tx_priority1_xon_to_xoff_packets": 0, "tx_priority2_xon_to_xoff_packets": 0, "tx_priority3_xon_to_xoff_packets": 0, "tx_priority4_xon_to_xoff_packets": 0, "tx_priority5_xon_to_xoff_packets": 0, "tx_priority6_xon_to_xoff_packets": 0, "tx_priority7_xon_to_xoff_packets": 0}}

我有RX-DESC = 128，并且配置了TX-DESC = 512。

我假设有一些DESC泄漏，是否有办法知道下降是否是由于没有DESC引起的？我应该检查哪个计数器：“

[更多信息] 调试Refcnt会导致一个死亡。遵循代码，似乎NIC卡没有在描述符上设置DONE状态。当调用RTE_ETH_TX_BUBST时，下一个func内部调用i40e_xmit_pkts-＆gt; I40E_XMIT_CLEANUP

发生问题时，以下条件失败导致NIC失败发送数据包。

    if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
            rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
            rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
        PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
               "(port=%d queue=%d)", desc_to_clean_to,
               txq->port_id, txq->queue_id);
        return -1;
    }

如果我评论“返回-1”（当然不是修复程序，并且会导致其他问题）。但是，我可以看到很长一段时间的流量是稳定的。我跟踪了从交通开始到发行的所有MBUF，我可以看到MBUF至少没有问题。

I40E_TX_DESC_DTYPE_DESC_DONE将在H/W中为描述符设置。有什么办法可以看到该代码？它是X710驱动程序代码的一部分吗？

我仍然怀疑我自己的代码，因为即使更换了NIC卡后，问题也存在。但是，我的代码效果如何不修改描述符的完成状态？任何建议确实会有所帮助。

[更新] 发现2个核心正在使用相同的TX Queueid发送数据包。

数据处理和TX Core
ARP REQ/DATA RX CORE的响应

这会导致一些潜在的损坏？找到了一些信息： http://mails.dpdk.org.org/archives/darchives/dev/dev/2014---2014---2014--- 1月/001077.html

在为ARP消息创建单独的队列后，不再看到问题/尚未看到2个+小时

原文

I am using DPDK 21.11 for my application. After a certain time, the the API rte_eth_tx_burst stops sending any packets out.

Ethernet Controller X710 for 10GbE SFP+ 1572
drv=vfio-pci

MAX_RETRY_COUNT_RTE_ETH_TX_BURST 3


 do
            {
                num_sent_pkt = rte_eth_tx_burst(eth_port_id, queue_id, &mbuf[mbuf_idx], pkt_count);
                pkt_count -= num_sent_pkt;
                retry_count++;
            } while(pkt_count && (retry_count != MAX_RETRY_COUNT_RTE_ETH_TX_BURST));

To debug, I tried to use telemetry to print out the xstats. However, i do not see any errors.

--> /ethdev/xstats,1
{"/ethdev/xstats": {"rx_good_packets": 97727, "tx_good_packets": 157902622, "rx_good_bytes": 6459916, "tx_good_bytes": 229590348448, "rx_missed_errors": 0, "rx_errors": 0, "tx_errors": 0, "rx_mbuf_allocation_errors": 0, "rx_unicast_packets": 95827, "rx_multicast_packets": 1901, "rx_broadcast_packets": 0, "rx_dropped_packets": 0, "rx_unknown_protocol_packets": 97728, "rx_size_error_packets": 0, "tx_unicast_packets": 157902621, "tx_multicast_packets": 0, "tx_broadcast_packets": 1, "tx_dropped_packets": 0, "tx_link_down_dropped": 0, "rx_crc_errors": 0, "rx_illegal_byte_errors": 0, "rx_error_bytes": 0, "mac_local_errors": 0, "mac_remote_errors": 0, "rx_length_errors": 0, "tx_xon_packets": 0, "rx_xon_packets": 0, "tx_xoff_packets": 0, "rx_xoff_packets": 0, "rx_size_64_packets": 967, "rx_size_65_to_127_packets": 96697, "rx_size_128_to_255_packets": 0, "rx_size_256_to_511_packets": 64, "rx_size_512_to_1023_packets": 0, "rx_size_1024_to_1522_packets": 0, "rx_size_1523_to_max_packets": 0, "rx_undersized_errors": 0, "rx_oversize_errors": 0, "rx_mac_short_dropped": 0, "rx_fragmented_errors": 0, "rx_jabber_errors": 0, "tx_size_64_packets": 0, "tx_size_65_to_127_packets": 46, "tx_size_128_to_255_packets": 0, "tx_size_256_to_511_packets": 0, "tx_size_512_to_1023_packets": 0, "tx_size_1024_to_1522_packets": 157902576, "tx_size_1523_to_max_packets": 0, "rx_flow_director_atr_match_packets": 0, "rx_flow_director_sb_match_packets": 13, "tx_low_power_idle_status": 0, "rx_low_power_idle_status": 0, "tx_low_power_idle_count": 0, "rx_low_power_idle_count": 0, "rx_priority0_xon_packets": 0, "rx_priority1_xon_packets": 0, "rx_priority2_xon_packets": 0, "rx_priority3_xon_packets": 0, "rx_priority4_xon_packets": 0, "rx_priority5_xon_packets": 0, "rx_priority6_xon_packets": 0, "rx_priority7_xon_packets": 0, "rx_priority0_xoff_packets": 0, "rx_priority1_xoff_packets": 0, "rx_priority2_xoff_packets": 0, "rx_priority3_xoff_packets": 0, "rx_priority4_xoff_packets": 0, "rx_priority5_xoff_packets": 0, "rx_priority6_xoff_packets": 0, "rx_priority7_xoff_packets": 0, "tx_priority0_xon_packets": 0, "tx_priority1_xon_packets": 0, "tx_priority2_xon_packets": 0, "tx_priority3_xon_packets": 0, "tx_priority4_xon_packets": 0, "tx_priority5_xon_packets": 0, "tx_priority6_xon_packets": 0, "tx_priority7_xon_packets": 0, "tx_priority0_xoff_packets": 0, "tx_priority1_xoff_packets": 0, "tx_priority2_xoff_packets": 0, "tx_priority3_xoff_packets": 0, "tx_priority4_xoff_packets": 0, "tx_priority5_xoff_packets": 0, "tx_priority6_xoff_packets": 0, "tx_priority7_xoff_packets": 0, "tx_priority0_xon_to_xoff_packets": 0, "tx_priority1_xon_to_xoff_packets": 0, "tx_priority2_xon_to_xoff_packets": 0, "tx_priority3_xon_to_xoff_packets": 0, "tx_priority4_xon_to_xoff_packets": 0, "tx_priority5_xon_to_xoff_packets": 0, "tx_priority6_xon_to_xoff_packets": 0, "tx_priority7_xon_to_xoff_packets": 0}}

I have RX-DESC = 128 and TX-DESC = 512 configured.

I am assuming there is some desc leak, is there a way to know if the drop is due to no-desc present? Which counter should I check for that"?

[More Info]
Debugging refcnt lead to a deadend.
Following the code, it seems that the NIC card does not set the DONE status on the descriptor.
When rte_eth_tx_burst is called, the next func internally calls i40e_xmit_pkts -> i40e_xmit_cleanup

When the issue occurs, the following condition fails leading to NIC failure in sending packets out.

    if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
            rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
            rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
        PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
               "(port=%d queue=%d)", desc_to_clean_to,
               txq->port_id, txq->queue_id);
        return -1;
    }

If I comment out the "return -1" (ofcourse not the fix and will lead to other issues) ..but I can see that traffic is stable for a long long time.
I tracked all the mbuf from start of traffic till issue is hit,there is no issue seen atleast in mbuf that I could see.

I40E_TX_DESC_DTYPE_DESC_DONE will be set in h/w for the descriptor. Is there any way I can see that code? Is it part of x710 driver code?

I still doubt my own code since the issue is present even after NIC card is replaced.
However, how can my code effect NIC card not modifying the DONE status of descriptor?
Any suggestions would really be helpful.

[UPDATE]
Found out that 2 cores were using the same TX queueID to send packets.

Data processing and TX core
ARP req/response by Data RX core

This lead to some potential corruption ?
Found some info on this:
http://mails.dpdk.org/archives/dev/2014-January/001077.html

After creating separate queue for ARP messages, issue is not seen anymore/yet for 2+hours

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧梦荧光笔 2025-02-11 21:40:34

[EDIT-2]将错误范围缩小到多个线程使用的是同一端口 - QueueID对，从而导致XMIT中的NIC中的摊位。早些时候，调试并不是专注于慢速道路（ARP回复），因此错过了。

[EDIT-1]基于消息的有限调试机会和更新，更新是

内部TX代码更新乘以2（即Refcnt是3）。
一旦收到答复，
现在针对
在RHEL和CENTOS上测试的MBUF_FREE降低了2个拐角案例，这两个案例都有问题，因此它是软件，而不是OS
更新NIC固件，现在所有平台在几个小时后都始终显示出错误跑步。

注意：

因此，所有指针都会导致代码和转角案例处理差距，因为TestPMD | L2FWD | L3FWD并未显示DPDK库或平台的错误。
由于未共享代码库，因此只有选项是依靠更新。

因此，在大量调试和分析之后，问题的根本原因不是DPDK，NIC或平台，而是所使用的代码中的差距 。

如果代码的目的是在MAX_RETRY_COUNT_RTE_RTE_TX_BUBST中尝试所有PKT_COUNT的所有数据包，则当前代码段需要一些更正。让我解释一下，

MBUF是有效数据包的数组，为TX
MBUF_IDX表示要发送的当前索引，用于TX
PKT_COUNT代表当前尝试中发送的数据包数。
num_sent_pkt表示发送给NIC（物理）的DMA副本发送的实际数据包。
RETRY_COUNT是局部变量保留重试的数量。

如果MAX_RETRY_COUNT_RTE_RTE_ETS_TX_BUBST超出了2个角案例，则需要处理2个角色（当前片段中未共享）

，并且num_sent_pkt不等于实际的Tx非传输MBUF。
如果有ref_cnt的MBUF大于1（尤其是使用多播或广播或数据包复制），则也需要一种机制。

可能的代码段可能是：

MAX_RETRY_COUNT_RTE_ETH_TX_BURST 3
retry_count = 0;
mbuf_idx = 0;
pkt_count = try_sent; /* try_sent intended send*/

/* if there are any mbuf with ref_cnt > 1, we need separate logic to handle those */

do {
  num_sent_pkt = rte_eth_tx_burst(eth_port_id, queue_id, &mbuf[mbuf_idx], pkt_count);

  pkt_count -= num_sent_pkt;
  mbuf_idx += num_sent_pkt;
  
  retry_count++;
} while((pkt_count) && (retry_count < MAX_RETRY_COUNT_RTE_ETH_TX_BURST));

/* to prevent the leak for unsent packet*/
if (pkt_count) {
    rte_pktmbuf_free_bulk(&mbuf[mbuf_idx], pkt_count);
}

注意：识别MBUF泄漏的最简单方法是运行DPDK二级过程PROC-INFO以检查MBUF免费计数。

[EDIT-1]基于调试，已经确定最近的确实大于1。积累此类角病导致Mempool耗竭。

日志：

dump mbuf at 0x2b67803c0, iova=0x2b6780440, buf_len=9344
pkt_len=1454, ol_flags=0x180, nb_segs=1, port=0, ptype=0x291
segment at 0x2b67803c0, data=0x2b67804b8, len=1454, off=120, refcnt=3

[EDIT-2] the error is narrowed down to multiple threads are using same portid-queueid pair which causes the stalls in the NIC from XMIT. Earlier the debug was not focusing on slow path (ARP reply) hence this was missed out.

[Edit-1] based on the limited debug opportunities and updates from the message, the updates are

The internal TX code updates refcnt by 2 (that is refcnt is 3).
Once the reply is received the refcnt is decremented by 2
Corner cases are now addressed for mbuf_free
Tested on RHEL and Centos both has issues, hence it is software and not os
updated the NIC firmware, now all platforms consistently shows error after a couple of hours of the run.

Note:

hence all pointers lead to code and corner case handling gaps since testpmd|l2fwd|l3fwd does not show the case the error with DPDK library or platform.
Since the code base is not shared, only option is to rely on updates.

hence after extensive debugging and analysis, the root cause of the issue is not DPDK, NIC or platform but GAP in the code being used.

If the code's intent is to try within MAX_RETRY_COUNT_RTE_ETH_TX_BURST for all packets of pkt_count, the current code snippet needs a few corrections. Let me explain

mbuf is the array of valid packets to be TX
mbuf_idx represents the current index to be sent for TX
pkt_count represents the number of packets sent out in the current attempt.
num_sent_pkt represents actual packets sent for DMA copy to NIC (physical).
retry_count is the local variable keeping count of retries.

there are 2 corner cases to be taken care of (not shared in the current snippet)

If MAX_RETRY_COUNT_RTE_ETH_TX_BURST is exceeded and num_sent_pkt is not equal to actual TX, at end of the while loop one needs to free up the non-transmitted MBUF.
If there are any MBUF with ref_cnt greater than 1 (especially with multicast or broadcast or packet duplication) one needs a mechanism free those too.

A possible code snippet could be:

MAX_RETRY_COUNT_RTE_ETH_TX_BURST 3
retry_count = 0;
mbuf_idx = 0;
pkt_count = try_sent; /* try_sent intended send*/

/* if there are any mbuf with ref_cnt > 1, we need separate logic to handle those */

do {
  num_sent_pkt = rte_eth_tx_burst(eth_port_id, queue_id, &mbuf[mbuf_idx], pkt_count);

  pkt_count -= num_sent_pkt;
  mbuf_idx += num_sent_pkt;
  
  retry_count++;
} while((pkt_count) && (retry_count < MAX_RETRY_COUNT_RTE_ETH_TX_BURST));

/* to prevent the leak for unsent packet*/
if (pkt_count) {
    rte_pktmbuf_free_bulk(&mbuf[mbuf_idx], pkt_count);
}

note: the easiest way to identify mbuf leak is to run DPDK secondary process proc-info to check for mbuf free count.

[EDIT-1] based on the debug, it has been identified that the recent is indeed greater than 1. Accumulating such corner cases lead to mempool depletion.

logs:

dump mbuf at 0x2b67803c0, iova=0x2b6780440, buf_len=9344
pkt_len=1454, ol_flags=0x180, nb_segs=1, port=0, ptype=0x291
segment at 0x2b67803c0, data=0x2b67804b8, len=1454, off=120, refcnt=3

回复收藏 0 原文

~没有更多了~