使用 Python 捕获 TCP 数据包

发布于 2024-11-30 15:39:53 字数 2638 浏览 4 评论 0原文

我尝试使用 dpkt 和 pcap 通过 Python 捕获 HTTP 下载。代码看起来像

...
pc = pcap.pcap(iface)
for ts, pkt in pc:
    handle_packet(pkt)

def handle_packet(pkt):
    eth = dpkt.ethernet.Ethernet(pkt)

    # Ignore non-IP and non-TCP packets
    if eth.type != dpkt.ethernet.ETH_TYPE_IP:
        return
    ip = eth.data
    if ip.p != dpkt.ip.IP_PROTO_TCP:
        return

    tcp = ip.data
    data = tcp.data

    # current connection
    c = (ip.src, ip.dst, tcp.sport, tcp.dport)

    # Handle only new HTTP-responses and TCP-packets
    # of existing connections.
    if c in conn:
        handle_tcp_packet(c, tcp)
    elif data[:4] == 'HTTP':
        handle_http_response(c, tcp)
...

handle_http_response()handle_tcp_packet() 中,我读取了 tcp 数据包的数据 (tcp.data) 并写入它们到一个文件。但是我注意到我经常收到具有相同 TCP 序列号 (tcp.seq) 的数据包(在同一连接上),但它们似乎包含相同的数据。此外,似乎并非所有数据包都被捕获。例如,如果我对数据包大小求和,则结果值低于 http 标头中列出的值 (content-length)。但在 Wireshark 中我可以看到所有包。

有谁知道为什么我会收到这些重复的数据包以及如何捕获属于 http 响应的每个数据包?

编辑:
您可以在这里找到完整的代码:pastebin.com。 运行时,它会向标准输出打印类似的内容:

Waiting for HTTP-Audio-responses ...
...
New TCP-Packet, len=1440, tcp-payload=5107680, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5109120, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5110560, con-len=5197150 , dups=57 , dup-bytes=82080
----------> FIN <----------
New TCP-Packet, len=1937, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=0, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080

如您所见,TCP 有效负载加上重复接收的字节 (5112497+82080=5194577) 低于下载的文件大小 (5197150)。此外,您可以看到我收到了 57 个重复的数据包(相同的 SEQ 和相同的 TCP 数据),并且在带有 FIN 标志的数据包之后仍然收到了数据包。

那么有人知道我如何捕获属于该连接的所有数据包吗? Wireshark 可以看到所有数据包,我认为它也使用 libpcap。

我什至不知道我是否做错了什么,或者 pcap 库是否做错了什么。

编辑2:
好的,看来我的代码是正确的:在 Wireshark 中,我保存了捕获的数据包并在代码中使用了捕获文件 (pcap.pcap('/home/path/filename') 而不是 <代码>pcap.pcap('eth0'))。我的代码完美地读取了所有包(在多次测试中)!由于 Wireshark 也使用 libpcap (据我所知),我认为问题是 lib pypcap 没有为我提供所有软件包。

关于如何测试它有什么想法吗?

我已经自己编译了 pypcap (主干),但这并没有改变任何东西 -.-

EDIT3:
好的,我更改了代码以使用 pcapy 而不是 pypcap,并且遇到了同样的问题:
当从以前捕获的文件(使用 Wireshark 创建)读取数据包时,一切都很好,但是当我直接从 eth0 捕获数据包时,我丢失了一些数据包。

有趣的是:当并行运行两个程序(一个使用 pypcap 和一个使用 pcapy)时,它们捕获不同的数据包。例如,一个程序多接收一个数据包。

但我还是不明白为什么-.-
我认为 Wireshark 使用相同的基础库(libpcap)。

请帮忙:)

I try to capture an HTTP-download with Python using dpkt and pcap. The code looks like

...
pc = pcap.pcap(iface)
for ts, pkt in pc:
    handle_packet(pkt)

def handle_packet(pkt):
    eth = dpkt.ethernet.Ethernet(pkt)

    # Ignore non-IP and non-TCP packets
    if eth.type != dpkt.ethernet.ETH_TYPE_IP:
        return
    ip = eth.data
    if ip.p != dpkt.ip.IP_PROTO_TCP:
        return

    tcp = ip.data
    data = tcp.data

    # current connection
    c = (ip.src, ip.dst, tcp.sport, tcp.dport)

    # Handle only new HTTP-responses and TCP-packets
    # of existing connections.
    if c in conn:
        handle_tcp_packet(c, tcp)
    elif data[:4] == 'HTTP':
        handle_http_response(c, tcp)
...

In handle_http_response() and handle_tcp_packet() i read the data of the tcp-packets (tcp.data) and write them to a file. However i noticed that i often get packets with the same TCP sequence number (tcp.seq) (on the same connection) but it seems that they contain the same data. Moreover it seems that not all packets are captured. For example if i sum up the packet-sizes the resulting value is lower than the one listed in the http-header (content-length). But in Wireshark i can see all packages.

Does anyone has an idea why i get those duplicate packets and how i can capture every packet belonging to the http-response?

EDIT:
Here you can find the complete code: pastebin.com.
When running it prints something like that to stdout:

Waiting for HTTP-Audio-responses ...
...
New TCP-Packet, len=1440, tcp-payload=5107680, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5109120, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5110560, con-len=5197150 , dups=57 , dup-bytes=82080
----------> FIN <----------
New TCP-Packet, len=1937, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=0, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080

As you can see the TCP-payload plus the duplicate received bytes (5112497+82080=5194577) are lower than the filesize of the download (5197150). Moreover you can see that i receive 57 duplicate packages (same SEQ and same TCP-data) and that still packages are received after the packet with the FIN-flag.

So does anyone have an idea how i can capture all packets belonging to the connection? Wireshark sees all packets and i think it uses libpcap too.

I don't even know if i do something wrong or if the pcap-library does something wrong.

EDIT2:
OK, it seems that my code is correct: In Wireshark I saved the captured packets and used the capture-file in my code (pcap.pcap('/home/path/filename') instead of pcap.pcap('eth0')). My code read perfectly all packages (on multiple tests)! Since Wireshark uses libpcap too (afaik), i think the problem is the lib pypcap which does not provide me all packages.

Any idea on how to test that?

I already compiled pypcap by myself (trunk) but that didn't change anything -.-

EDIT3:
OK, I changed my code to work with pcapy instead of pypcap and have the same problem:
When reading the packets from a previous captured file (created with Wireshark) then everything is fine, but when I capture the packets directly from eth0 I miss some packets.

Interesting: When running both programs (the one using pypcap and the one using pcapy) in parallel they capture different packets. e.g. one programm receives one packet more.

But I have still no idea why -.-
I thought Wireshark uses the same base-lib (libpcap).

Please help :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

岁月流歌 2024-12-07 15:39:53

这里有一些需要注意的事情:

  • 确保你有一个大的 snaplen - 对于 pcapy,你可以在 open_live (第二个参数)上设置它
  • 确保你处理碎片数据包 - 这不会自动完成 - 你需要检查详细信息
  • 检查统计数据 - 不幸的是,我不认为这会暴露给 pcapy 接口,但您可能没有处理所有数据包;如果你太晚了,你将不知道你错过了一些东西(尽管你可以通过跟踪 tcp 流的长度/位置来获得相同的信息)libpcap 本身确实公开了这些统计信息,所以你也许可以为其添加函数

Here's a couple of things to watch out for:

  • make sure you have a big snaplen - for pcapy you can set it on open_live (second parameter)
  • make sure you handle fragmented packets - this will not be done automatically - you need to check the details
  • check statistics - unfortunately I don't think this is exposed to pcapy interface, but it's possible that you're not handling all packets; if you're too late you will not know that you missed something (although you can get the same information by tracking the length / position of tcp stream) libpcap itself does expose those statistics, so you might be able to add the function for it
默嘫て 2024-12-07 15:39:53

将 snaplen 设置为 65535。显然这是 Wireshark 的默认值:
http://www.wireshark.org/docs/wsug_html_chunked/ChCustCommandLine.html

Set the snaplen to 65535. Apparently this is the default for Wireshark:
http://www.wireshark.org/docs/wsug_html_chunked/ChCustCommandLine.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文