dpdk Forward接收到默认网络堆栈的数据包

发布于 2025-01-27 10:36:38 字数 650 浏览 5 评论 0 原文

我们使用DPDK(Ubuntu 20.04上的20.08版,C ++应用程序)接收具有高吞吐量(> 2 MPPS)的UDP数据包。我们使用Mellanox ConnectX-5 NIC(在较旧的系统中使用Mellanox Connectx-3,如果解决方案在此工作也很棒)。

相反,由于我们只需要发送一些配置消息,因此我们通过默认网络堆栈发送消息。这样,我们可以使用许多随时可用的工具来发送配置消息。但是,由于所有收到的数据均由DPDK消耗,因此这些工具不会收回任何消息。

最突出的问题是ARP协商:主机试图解决地址,客户也确实做出了正确的响应,但是,这些响应全部被DPDK所消耗,因此主机无法解决地址并拒绝发送实际的UDP数据包。

我们的想法是过滤我们应用程序上的高吞吐量数据包,并以某种方式将其他所有内容(例如ARP响应)“转发”到默认网络堆栈中。 DPDK是否有一个内置的解决方案?我不幸的是在示例中找不到任何东西。

我最近听说过 packet 允许将数据包注入sock_dgram sockets的功能可能是可能的解决方案。不过,我也找不到我们用例的示例实现。任何帮助将不胜感激。

We're using DPDK (version 20.08 on ubuntu 20.04, c++ application) to receive UDP packets with a high throughput (>2 Mpps). We use a Mellanox ConnectX-5 NIC (and a Mellanox ConnectX-3 in an older system, would be great if the solution worked there aswell).

Contrary, since we only need to send a few configuration messages, we send messages through the default network stack. This way, we can use lots of readily available tools to send configuration messages; however, since all the received data is consumed by DPDK, these tools do not get back any messages.

The most prominent issue arises with ARP negotiation: the host tries to resolve addresses, the clients also do respond properly, however, these responses are all consumed by DPDK such that the host cannot resolve the addresses and refuses to send the actual UDP packets.

Our idea would be to filter out the high throughput packets on our application and somehow "forward" everything else (e.g. ARP responses) to the default network stack. Does DPDK have a built-in solution for that? I unfortunatelly coulnd't find anything in the examples.

I've recently heard about the packet function which allows to inject packets into SOCK_DGRAM sockets which may be a possible solution. I also couldn't find a sample implementation for our use-case, though. Any help is greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

独自←快乐 2025-02-03 10:36:38

从理论上讲,如果所讨论的NIC支持嵌入式开关功能,则应该有可能拦截硬件中感兴趣的数据包并将其重定向到与物理功能相关的虚拟功能(VF)(PF)(PF) ),PF本身接收了其他所有内容。

  • 用户在NIC /主机上配置SR-IOV功能以及虚拟化支持;
  • 对于给定的NIC PF,用户添加了VF并将其绑定到相应的Linux驱动程序;
  • DPDK应用程序使用PF ethdev运行,VF的A 代表 ethdev;
  • 为了处理相关数据包,该应用程序添加了相应的流规则。

PF(EthDev 0 )和VF代表器(ethdev 1 )必须由应用程序中的相应EAL参数明确指定: -a [PCI:PCI:PCI: DBDF],代表= VF0

至于流量规则,应该有一对。

first 规则的组件如下:

second 规则中,item 表示具有 port_id = 1 ,而操作 expanded_port has port_id = 0 (也就是说,此规则是逆的)。其他一切都应该保持不变。

重要的是要注意,某些驱动程序目前不支持 item 表示。相反,他们希望通过相应的Ethdev添加规则。这样,在提供的示例中:第一个规则转到ethdev 0 ,第二个规则转到ethdev 1


根据OP更新,所讨论的适配器确实可能支持嵌入式开关功能。但是,如上所述, item 代表可能不支持。规则应通过特定的Ethdev插入。另外,还有一个属性,可能需要指定。

为了检查该方案是否有效,应该能够部署VF(如上所述)并运行 testpmd 带有上述EAL参数。在应用程序的命令行中,可以按以下方式测试这两个流规则:

  • flow创建0 Ingress传输模式ETH类型为0x0806 / END ACTION ARNEDED_PORT ETHDEV_PORT_ID 1 / END END < / code>
  • Flow Create 1 Ingress 1 Ingress 1 Ingress传输模式ETH类型为0x0806 / END操作代表eThdev_port_id 0 / end < / code>

完成后,应将ARP数据包传递给VF(因此,转换为网络接口)。其余的数据包应在活动转发模式下通过 testpmd start 命令)看到。

注意:建议切换到最近的DPDK版本。

Theoretically, if the NIC in question supports the embedded switch feature, it should be possible to intercept the packets of interest in the hardware and redirect them to a virtual function (VF) associated with the physical function (PF), with the PF itself receiving everything else.

  • The user configures SR-IOV feature on the NIC / host as well as virtualisation support;
  • For a given NIC PF, the user adds a VF and binds it to the corresponding Linux driver;
  • The DPDK application is run with the PF ethdev and a representor ethdev for the VF;
  • To handle the packets in question, the application adds the corresponding flow rules.

The PF (ethdev 0) and the VF representor (ethdev 1) have to be explicitly specified by the corresponding EAL argument in the application: -a [pci:dbdf],representor=vf0.

As for the flow rules, there should be a pair of such.

The first rule's components are as follows:

  • Attribute transfer (demands that matching packets be handled in the embedded switch);
  • Pattern item REPRESENTED_PORT with port_id = 0 (instructs the NIC to intercept packets coming to the embedded switch from the network port represented by the PF ethdev);
  • Pattern items matching on network headers (these provide narrower match criteria);
  • Action REPRESENTED_PORT with port_id = 1 (redirects packets to the VF).

In the second rule, item REPRESENTED_PORT has port_id = 1, and action REPRESENTED_PORT has port_id = 0 (that is, this rule is inverse). Everything else should remain the same.

It is important to note that some drivers do not support item REPRESENTED_PORT at the moment. Instead, they expect that the rules be added via the corresponding ethdevs. This way, for the provided example: the first rule goes to ethdev 0, the second one goes to ethdev 1.


As per the OP update, the adapter in question might indeed support the embedded switch feature. However, as noted above, item REPRESENTED_PORT might not be supported. The rules should be inserted via specific ethdevs. Also, one more attribute, ingress, might need to be specified.

In order to check whether this scheme works, one should be able to deploy a VF (as described above) and run testpmd with the aforementioned EAL argument. In the command line of the application, the two flow rules can be tested as follows:

  • flow create 0 ingress transfer pattern eth type is 0x0806 / end actions represented_port ethdev_port_id 1 / end
  • flow create 1 ingress transfer pattern eth type is 0x0806 / end actions represented_port ethdev_port_id 0 / end

Once done, that should pass ARP packets to the VF (thus, to the network interface) in question. The rest of packets should be seen by testpmd in active forwarding mode (start command).

NOTE: it is recommended to switch to the most recent DPDK release.

沧笙踏歌 2025-02-03 10:36:38

我们已经使用分叉驱动程序基于流量隔离(使用MLX5的设备默认)实现了解决方案。

在此示例中,我们正在过滤DPDK处理要处理的所有UDP数据包,并将所有其他UDP数据包留在内核网络堆栈中。该图案可以轻松扩展以适合不同的数据包。

在此示例中,遵守了API调用返回值(主要是错误数字),但在实际应用中强烈鼓励。

/* These would mostly be set from a function call */
const uint16_t rx_rings = 1, tx_rings = 1;
uint16_t nb_rxd = 1000;
uint16_t nb_txd = 1000;
uint16_t q;
struct rte_eth_dev_info dev_info;
struct rte_eth_txconf txconf;

if(!rte_eth_dev_is_valid_port(port))
    return -1;

rte_eth_dev_info_get(port, &dev_info);

/* Setup flow rules. */

/* Flow items */
struct rte_flow_item flow_pattern[4]; /* 4 parts: ethernet, ipv4, udp, end */

/* Ethernet Layer */
static struct rte_flow_item eth_item = {RTE_FLOW_ITEM_TYPE_ETH, 0, 0, 0};
flow_pattern[0] = eth_item;

/* IPv4 Layer */
struct rte_flow_item ipv4_item;
ipv4_item.type = RTE_FLOW_ITEM_TYPE_IPV4;
flow_pattern[1] = ipv4_item;

/* UDP Layer */
struct rte_flow_item udp_item;
udp_item.type = RTE_FLOW_ITEM_TYPE_UDP;
flow_pattern[2] = udp_item;

/* Terminate the pattern list */
static struct rte_flow_item end_item = {RTE_FLOW_ITEM_TYPE_END, 0, 0, 0};
flow_pattern[3] = end_item;

/* Flow actions */
struct rte_flow_action flow_actions[2];
static struct rte_flow_action_queue flow_action_queue_conf = {0}; // enqueue in queue 0
static struct rte_flow_action flow_action_queue
{
    RTE_FLOW_ACTION_TYPE_QUEUE, &flow_action_queue_conf
};
flow_actions[0] = flow_action_queue;

/* Terminate flow action list */
static struct rte_flow_action flow_action_end = {RTE_FLOW_ACTION_TYPE_END, 0};
flow_actions[1] = flow_action_end;

/* Flow attributes */
static struct rte_flow_attr flow_attrs;
flow_attrs.ingress = 1;

/* Initialize flow isolation to forward messages to kernel network stack */
/* This will only work with bifurcated drivers */
struct rte_flow_error flow_errors;
rte_flow_isolate(port, 1, &flow_errors);

/* Configure the Ethernet device. */
rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);

/* Allocate and set up 1 RX queue per Ethernet port. */
for(q = 0; q < rx_rings; q++)
{
    rte_eth_rx_queue_setup(port, q, nb_rxd, rte_eth_dev_socket_id(port), NULL, mbuf_pool);
}

/* Allocate and set up 1 TX queue per Ethernet port. */
for(q = 0; q < tx_rings; q++)
{
    rte_eth_tx_queue_setup(port, q, nb_txd, rte_eth_dev_socket_id(port), &txconf);
}

/* Validate flow */
/* This only works after configuring the port to forward to */
rte_flow_validate(port, &flow_attrs, flow_pattern, flow_actions, &flow_errors);

/* Start the Ethernet port. */
rte_eth_dev_start(port);

/* Create the flow /*
/* This will only work after the queue was started */
rte_flow_create(port, &flow_attrs, flow_pattern, flow_actions, &flow_errors);

We've implemented a solution based on flow isolation using the bifurcated driver (which is default for our devices using mlx5).

In this example, we're filtering all the UDP packets to be handled by DPDK and leave all the others to the kernel network stack. This pattern can be easily extended to fit different packets.

Handling of API calls return values (mostly error nums) is ommited in this example but is strongly encouraged in real applications.

/* These would mostly be set from a function call */
const uint16_t rx_rings = 1, tx_rings = 1;
uint16_t nb_rxd = 1000;
uint16_t nb_txd = 1000;
uint16_t q;
struct rte_eth_dev_info dev_info;
struct rte_eth_txconf txconf;

if(!rte_eth_dev_is_valid_port(port))
    return -1;

rte_eth_dev_info_get(port, &dev_info);

/* Setup flow rules. */

/* Flow items */
struct rte_flow_item flow_pattern[4]; /* 4 parts: ethernet, ipv4, udp, end */

/* Ethernet Layer */
static struct rte_flow_item eth_item = {RTE_FLOW_ITEM_TYPE_ETH, 0, 0, 0};
flow_pattern[0] = eth_item;

/* IPv4 Layer */
struct rte_flow_item ipv4_item;
ipv4_item.type = RTE_FLOW_ITEM_TYPE_IPV4;
flow_pattern[1] = ipv4_item;

/* UDP Layer */
struct rte_flow_item udp_item;
udp_item.type = RTE_FLOW_ITEM_TYPE_UDP;
flow_pattern[2] = udp_item;

/* Terminate the pattern list */
static struct rte_flow_item end_item = {RTE_FLOW_ITEM_TYPE_END, 0, 0, 0};
flow_pattern[3] = end_item;

/* Flow actions */
struct rte_flow_action flow_actions[2];
static struct rte_flow_action_queue flow_action_queue_conf = {0}; // enqueue in queue 0
static struct rte_flow_action flow_action_queue
{
    RTE_FLOW_ACTION_TYPE_QUEUE, &flow_action_queue_conf
};
flow_actions[0] = flow_action_queue;

/* Terminate flow action list */
static struct rte_flow_action flow_action_end = {RTE_FLOW_ACTION_TYPE_END, 0};
flow_actions[1] = flow_action_end;

/* Flow attributes */
static struct rte_flow_attr flow_attrs;
flow_attrs.ingress = 1;

/* Initialize flow isolation to forward messages to kernel network stack */
/* This will only work with bifurcated drivers */
struct rte_flow_error flow_errors;
rte_flow_isolate(port, 1, &flow_errors);

/* Configure the Ethernet device. */
rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);

/* Allocate and set up 1 RX queue per Ethernet port. */
for(q = 0; q < rx_rings; q++)
{
    rte_eth_rx_queue_setup(port, q, nb_rxd, rte_eth_dev_socket_id(port), NULL, mbuf_pool);
}

/* Allocate and set up 1 TX queue per Ethernet port. */
for(q = 0; q < tx_rings; q++)
{
    rte_eth_tx_queue_setup(port, q, nb_txd, rte_eth_dev_socket_id(port), &txconf);
}

/* Validate flow */
/* This only works after configuring the port to forward to */
rte_flow_validate(port, &flow_attrs, flow_pattern, flow_actions, &flow_errors);

/* Start the Ethernet port. */
rte_eth_dev_start(port);

/* Create the flow /*
/* This will only work after the queue was started */
rte_flow_create(port, &flow_attrs, flow_pattern, flow_actions, &flow_errors);
满身野味 2025-02-03 10:36:38

对于当前的用例,最好的选择是使用DPDK TAP PMD(这是Linux DPDK的一部分)。您可以使用软件或硬件来过滤特定的数据包,然后将其发送到所需的TAP接口。

一个简单的示例来证明同样的方法是使用DPDK 骨架示例。

  1. 通过 cd [root文件夹]/示例/骨架构建DPDK示例; make static
  2. pass the desired Physical DPDK PMD NIC using DPDK eal options ./build/basicfwd -l 1 -w [pcie id of DPDK NIC] --vdev=net_tap0;iface=dpdkTap
  3. 在第二端子中执行 ifconfig dpdktap 0.0.0.0 promisc up
  4. 使用tpcudmp使用 tcpdump -eni dpdktap -q in in and tcpdump -enu dpdktktap--enu dpdktktap--eni dpdktap-emp the Ingress和Egress数据包q分别。

注意:您可以配置IP地址, dpdktap 上的设置TC。另外,您也可以运行自定义套接字程序。您不需要根据需要将时间投入TLDP,ANS,VPP上,您只需要一种机制即可从内核网络堆栈注入和接收数据包。

For the current use case, the best option is to make use of DPDK TAP PMD (which is part of LINUX DPDK). You can use Software or Hardware to filter the specific packets then sent it desired TAP interface.

A simple example to demonstrate the same would be making use DPDK skeleton example.

  1. build the DPDK example via cd [root folder]/example/skeleton; make static
  2. pass the desired Physical DPDK PMD NIC using DPDK eal options ./build/basicfwd -l 1 -w [pcie id of DPDK NIC] --vdev=net_tap0;iface=dpdkTap
  3. In second terminal execute ifconfig dpdkTap 0.0.0.0 promisc up
  4. Use tpcudmp to capture Ingress and Egress packets using tcpdump -eni dpdkTap -Q in and tcpdump -enu dpdkTap -Q out respectively.

Note: you can configure ip address, setup TC on dpdkTap. Also you can run your custom socket programs too. You do not need to invest time on TLDP, ANS, VPP as per your requirement you just need an mechanism to inject and receive packet from Kernel network stack.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文