使用 Python 捕获 TCP 数据包
我尝试使用 dpkt 和 pcap 通过 Python 捕获 HTTP 下载。代码看起来像
...
pc = pcap.pcap(iface)
for ts, pkt in pc:
handle_packet(pkt)
def handle_packet(pkt):
eth = dpkt.ethernet.Ethernet(pkt)
# Ignore non-IP and non-TCP packets
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip = eth.data
if ip.p != dpkt.ip.IP_PROTO_TCP:
return
tcp = ip.data
data = tcp.data
# current connection
c = (ip.src, ip.dst, tcp.sport, tcp.dport)
# Handle only new HTTP-responses and TCP-packets
# of existing connections.
if c in conn:
handle_tcp_packet(c, tcp)
elif data[:4] == 'HTTP':
handle_http_response(c, tcp)
...
在 handle_http_response()
和 handle_tcp_packet()
中,我读取了 tcp 数据包的数据 (tcp.data
) 并写入它们到一个文件。但是我注意到我经常收到具有相同 TCP 序列号 (tcp.seq
) 的数据包(在同一连接上),但它们似乎包含相同的数据。此外,似乎并非所有数据包都被捕获。例如,如果我对数据包大小求和,则结果值低于 http 标头中列出的值 (content-length
)。但在 Wireshark 中我可以看到所有包。
有谁知道为什么我会收到这些重复的数据包以及如何捕获属于 http 响应的每个数据包?
编辑:
您可以在这里找到完整的代码:pastebin.com。 运行时,它会向标准输出打印类似的内容:
Waiting for HTTP-Audio-responses ...
...
New TCP-Packet, len=1440, tcp-payload=5107680, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5109120, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5110560, con-len=5197150 , dups=57 , dup-bytes=82080
----------> FIN <----------
New TCP-Packet, len=1937, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=0, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
如您所见,TCP 有效负载加上重复接收的字节 (5112497+82080=5194577) 低于下载的文件大小 (5197150)。此外,您可以看到我收到了 57 个重复的数据包(相同的 SEQ 和相同的 TCP 数据),并且在带有 FIN 标志的数据包之后仍然收到了数据包。
那么有人知道我如何捕获属于该连接的所有数据包吗? Wireshark 可以看到所有数据包,我认为它也使用 libpcap。
我什至不知道我是否做错了什么,或者 pcap 库是否做错了什么。
编辑2:
好的,看来我的代码是正确的:在 Wireshark 中,我保存了捕获的数据包并在代码中使用了捕获文件 (pcap.pcap('/home/path/filename')
而不是 <代码>pcap.pcap('eth0'))。我的代码完美地读取了所有包(在多次测试中)!由于 Wireshark 也使用 libpcap (据我所知),我认为问题是 lib pypcap 没有为我提供所有软件包。
关于如何测试它有什么想法吗?
我已经自己编译了 pypcap (主干),但这并没有改变任何东西 -.-
EDIT3:
好的,我更改了代码以使用 pcapy 而不是 pypcap,并且遇到了同样的问题:
当从以前捕获的文件(使用 Wireshark 创建)读取数据包时,一切都很好,但是当我直接从 eth0 捕获数据包时,我丢失了一些数据包。
有趣的是:当并行运行两个程序(一个使用 pypcap 和一个使用 pcapy)时,它们捕获不同的数据包。例如,一个程序多接收一个数据包。
但我还是不明白为什么-.-
我认为 Wireshark 使用相同的基础库(libpcap)。
请帮忙:)
I try to capture an HTTP-download with Python using dpkt and pcap. The code looks like
...
pc = pcap.pcap(iface)
for ts, pkt in pc:
handle_packet(pkt)
def handle_packet(pkt):
eth = dpkt.ethernet.Ethernet(pkt)
# Ignore non-IP and non-TCP packets
if eth.type != dpkt.ethernet.ETH_TYPE_IP:
return
ip = eth.data
if ip.p != dpkt.ip.IP_PROTO_TCP:
return
tcp = ip.data
data = tcp.data
# current connection
c = (ip.src, ip.dst, tcp.sport, tcp.dport)
# Handle only new HTTP-responses and TCP-packets
# of existing connections.
if c in conn:
handle_tcp_packet(c, tcp)
elif data[:4] == 'HTTP':
handle_http_response(c, tcp)
...
In handle_http_response()
and handle_tcp_packet()
i read the data of the tcp-packets (tcp.data
) and write them to a file. However i noticed that i often get packets with the same TCP sequence number (tcp.seq
) (on the same connection) but it seems that they contain the same data. Moreover it seems that not all packets are captured. For example if i sum up the packet-sizes the resulting value is lower than the one listed in the http-header (content-length
). But in Wireshark i can see all packages.
Does anyone has an idea why i get those duplicate packets and how i can capture every packet belonging to the http-response?
EDIT:
Here you can find the complete code: pastebin.com.
When running it prints something like that to stdout:
Waiting for HTTP-Audio-responses ...
...
New TCP-Packet, len=1440, tcp-payload=5107680, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5109120, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=1440, tcp-payload=5110560, con-len=5197150 , dups=57 , dup-bytes=82080
----------> FIN <----------
New TCP-Packet, len=1937, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
New TCP-Packet, len=0, tcp-payload=5112497, con-len=5197150 , dups=57 , dup-bytes=82080
As you can see the TCP-payload plus the duplicate received bytes (5112497+82080=5194577) are lower than the filesize of the download (5197150). Moreover you can see that i receive 57 duplicate packages (same SEQ and same TCP-data) and that still packages are received after the packet with the FIN-flag.
So does anyone have an idea how i can capture all packets belonging to the connection? Wireshark sees all packets and i think it uses libpcap too.
I don't even know if i do something wrong or if the pcap-library does something wrong.
EDIT2:
OK, it seems that my code is correct: In Wireshark I saved the captured packets and used the capture-file in my code (pcap.pcap('/home/path/filename')
instead of pcap.pcap('eth0')
). My code read perfectly all packages (on multiple tests)! Since Wireshark uses libpcap too (afaik), i think the problem is the lib pypcap which does not provide me all packages.
Any idea on how to test that?
I already compiled pypcap by myself (trunk) but that didn't change anything -.-
EDIT3:
OK, I changed my code to work with pcapy instead of pypcap and have the same problem:
When reading the packets from a previous captured file (created with Wireshark) then everything is fine, but when I capture the packets directly from eth0 I miss some packets.
Interesting: When running both programs (the one using pypcap and the one using pcapy) in parallel they capture different packets. e.g. one programm receives one packet more.
But I have still no idea why -.-
I thought Wireshark uses the same base-lib (libpcap).
Please help :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里有一些需要注意的事情:
Here's a couple of things to watch out for:
将 snaplen 设置为 65535。显然这是 Wireshark 的默认值:
http://www.wireshark.org/docs/wsug_html_chunked/ChCustCommandLine.html
Set the snaplen to 65535. Apparently this is the default for Wireshark:
http://www.wireshark.org/docs/wsug_html_chunked/ChCustCommandLine.html