Python:将tcpdump转换为text2pcap可读格式
我编写了一个 Python
脚本,用于将 tcpdump -i eth0 -neXXs0
的文本输出转换为 text2pcap
可以理解的格式。这是我的第一个 Python
程序,我正在寻找建议来提高其效率、可读性或代码中的任何潜在差异。
我正在使用的tcpdump
输出格式如下所示:
20:11:32.001190 00:16:76:7f:2b:b1 > 00:11:5c:78:ca:c0, ethertype IPv4 (0x0800), length 72: 123.236.188.140.41756 > 94.59.34.210.45931: UDP, length 30
0x0000: 0011 5c78 cac0 0016 767f 2bb1 0800 4500 ..\x....v.+...E.
0x0010: 003a 0000 4000 4011 812d 7bec bc8c 5e3b .:..@[email protected]{...^;
0x0020: 22d2 a31c b36b 0026 b9bd 2033 6890 ad33 "....k.&...3h..3
0x0030: e845 4b8d 2ba1 0685 0cb3 70dd 9b98 76d8 .EK.+.....p...v.
0x0040: 8fc6 8293 bf33 325a .....32Z
输出
格式可以被text2pcap
理解:
20:11:32.001190
0000: 00 11 5c 78 ca c0 00 16 76 7f 2b b1 08 00 45 00 ..\x....v.+...E.
0010: 00 3a 00 00 40 00 40 11 81 2d 7b ec bc 8c 5e 3b .:..@[email protected]{...^;
0020: 22 d2 a3 1c b3 6b 00 26 b9 bd 20 33 68 90 ad 33 "....k.&...3h..3
0030: e8 45 4b 8d 2b a1 06 85 0c b3 70 dd 9b 98 76 d8 .EK.+.....p...v.
0040: 8f c6 82 93 bf 33 32 5a .....32Z
以下是我的代码:
import re
# Identify time of the current packet.
time = re.compile('(..:..:..\.[\w]*) ')
# Get individual elements from the packet. ie. offset, hexdump, chars
all = re.compile('[ |\t]+0x([\w]+:) +(.+) +(.*)')
# Regex for two spaces
twoSpaces = re.compile(' +')
# Regex for single space
singleSpace = re.compile(' ')
# Single byte pattern.
singleBytePattern = re.compile(r'([\w][\w])')
# Open files.
f = open('pcap.txt', 'r')
outfile = open('ashu.txt', 'w')
for line in f:
result = time.match(line)
if result:
# If current line contains time format dump only time
print(result.group())
outfile.write(result.group() + '\n')
else:
print(line)
# Split line containing hex dump and tokenize into list elements.
result = all.split(line)
if result:
i = 0
for values in result:
if i == 2:
# Strip off additional spaces in hex dump
# Useful when hex dump does not end in 16 bytes boundary.
val = twoSpaces.sub('', values)
# Tokenize individual elements separated by single space.
byteResult = singleSpace.split(val)
for twoByte in byteResult:
# Identify individual byte
singleByte = singleBytePattern.split(twoByte)
byteOffset = 0
for oneByte in singleByte:
if byteOffset == 1 or byteOffset == 3:
# Write out individual byte with a space char appended
print(oneByte, end=' ')
outfile.write(oneByte + ' ')
byteOffset += 1
elif i == 3:
# Write of char format of hex dump
print(" " + values, end='')
outfile.write(' ' + values + ' ')
elif i == 4:
outfile.write(values)
else:
print(values, end=' ')
outfile.write(values + ' ')
i += 1
else:
print("could not split")
f.close()
outfile.close()
I've written a Python
script to convert the textual output of tcpdump -i eth0 -neXXs0
into a format understandable by text2pcap
. This is my first Python
program, and I'm looking for suggestions to enhance its efficiency, readability, or any potential discrepancies in the code.
The tcpdump
output format I'm working with looks like this:
20:11:32.001190 00:16:76:7f:2b:b1 > 00:11:5c:78:ca:c0, ethertype IPv4 (0x0800), length 72: 123.236.188.140.41756 > 94.59.34.210.45931: UDP, length 30
0x0000: 0011 5c78 cac0 0016 767f 2bb1 0800 4500 ..\x....v.+...E.
0x0010: 003a 0000 4000 4011 812d 7bec bc8c 5e3b .:..@[email protected]{...^;
0x0020: 22d2 a31c b36b 0026 b9bd 2033 6890 ad33 "....k.&...3h..3
0x0030: e845 4b8d 2ba1 0685 0cb3 70dd 9b98 76d8 .EK.+.....p...v.
0x0040: 8fc6 8293 bf33 325a .....32Z
Output
Format understandable by text2pcap
:
20:11:32.001190
0000: 00 11 5c 78 ca c0 00 16 76 7f 2b b1 08 00 45 00 ..\x....v.+...E.
0010: 00 3a 00 00 40 00 40 11 81 2d 7b ec bc 8c 5e 3b .:..@[email protected]{...^;
0020: 22 d2 a3 1c b3 6b 00 26 b9 bd 20 33 68 90 ad 33 "....k.&...3h..3
0030: e8 45 4b 8d 2b a1 06 85 0c b3 70 dd 9b 98 76 d8 .EK.+.....p...v.
0040: 8f c6 82 93 bf 33 32 5a .....32Z
Following is my Code:
import re
# Identify time of the current packet.
time = re.compile('(..:..:..\.[\w]*) ')
# Get individual elements from the packet. ie. offset, hexdump, chars
all = re.compile('[ |\t]+0x([\w]+:) +(.+) +(.*)')
# Regex for two spaces
twoSpaces = re.compile(' +')
# Regex for single space
singleSpace = re.compile(' ')
# Single byte pattern.
singleBytePattern = re.compile(r'([\w][\w])')
# Open files.
f = open('pcap.txt', 'r')
outfile = open('ashu.txt', 'w')
for line in f:
result = time.match(line)
if result:
# If current line contains time format dump only time
print(result.group())
outfile.write(result.group() + '\n')
else:
print(line)
# Split line containing hex dump and tokenize into list elements.
result = all.split(line)
if result:
i = 0
for values in result:
if i == 2:
# Strip off additional spaces in hex dump
# Useful when hex dump does not end in 16 bytes boundary.
val = twoSpaces.sub('', values)
# Tokenize individual elements separated by single space.
byteResult = singleSpace.split(val)
for twoByte in byteResult:
# Identify individual byte
singleByte = singleBytePattern.split(twoByte)
byteOffset = 0
for oneByte in singleByte:
if byteOffset == 1 or byteOffset == 3:
# Write out individual byte with a space char appended
print(oneByte, end=' ')
outfile.write(oneByte + ' ')
byteOffset += 1
elif i == 3:
# Write of char format of hex dump
print(" " + values, end='')
outfile.write(' ' + values + ' ')
elif i == 4:
outfile.write(values)
else:
print(values, end=' ')
outfile.write(values + ' ')
i += 1
else:
print("could not split")
f.close()
outfile.close()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用 tcpdump 的 -w 选项写入 pcap 格式文件,
Wireshark 应该能够读取它。
Use the
-w
option oftcpdump
to write to a pcap format fileWireshark should be able to read it.
我做了一个等效的 powershell。 text2pcap.exe 接受它,但我确实收到大部分“不一致的偏移。期望 0,得到 10。忽略数据包的其余部分”警告。 Wireshark 打开但看起来不正确。我将检查我的 tcpdump 操作数和 text2pcap 操作数,看看是否可以让它看起来更好。
提供下面的代码以防对某人有帮助。
I made a powershell equivalent. text2pcap.exe accepts it but I do get mostly "Inconsistent offset. Expecting 0, got 10. Ignoring rest of packet" warnings. Wireshark opens but doesn't look right. I'm going to check my tcpdump operands and text2pcap operands to see if I can get it to look better.
Providing code below in case it helps someone.