Python：将tcpdump转换为text2pcap可读格式

发布于 2024-09-27 04:27:17 字数 3777 浏览 14 评论 0原文

我编写了一个 Python 脚本，用于将 tcpdump -i eth0 -neXXs0 的文本输出转换为 text2pcap 可以理解的格式。这是我的第一个 Python 程序，我正在寻找建议来提高其效率、可读性或代码中的任何潜在差异。

我正在使用的tcpdump输出格式如下所示：

20:11:32.001190 00:16:76:7f:2b:b1 > 00:11:5c:78:ca:c0, ethertype IPv4 (0x0800), length 72: 123.236.188.140.41756 > 94.59.34.210.45931: UDP, length 30
    
    0x0000:  0011 5c78 cac0 0016 767f 2bb1 0800 4500  ..\x....v.+...E.
    0x0010:  003a 0000 4000 4011 812d 7bec bc8c 5e3b  .:..@[email protected]{...^;
    0x0020:  22d2 a31c b36b 0026 b9bd 2033 6890 ad33  "....k.&...3h..3
    0x0030:  e845 4b8d 2ba1 0685 0cb3 70dd 9b98 76d8  .EK.+.....p...v.
    0x0040:  8fc6 8293 bf33 325a                      .....32Z

输出

格式可以被text2pcap理解：

20:11:32.001190 

    0000: 00 11 5c 78 ca c0 00 16 76 7f 2b b1 08 00 45 00   ..\x....v.+...E. 
    0010: 00 3a 00 00 40 00 40 11 81 2d 7b ec bc 8c 5e 3b   .:..@[email protected]{...^; 
    0020: 22 d2 a3 1c b3 6b 00 26 b9 bd 20 33 68 90 ad 33   "....k.&...3h..3
    0030: e8 45 4b 8d 2b a1 06 85 0c b3 70 dd 9b 98 76 d8   .EK.+.....p...v. 
    0040: 8f c6 82 93 bf 33 32 5a   .....32Z

以下是我的代码：

import re

# Identify time of the current packet.
time = re.compile('(..:..:..\.[\w]*) ')
# Get individual elements from the packet. ie. offset, hexdump, chars
all = re.compile('[ |\t]+0x([\w]+:) +(.+)  +(.*)')
# Regex for two spaces
twoSpaces = re.compile('  +')
# Regex for single space
singleSpace = re.compile(' ')
# Single byte pattern.
singleBytePattern = re.compile(r'([\w][\w])')

# Open files.
f = open('pcap.txt', 'r')
outfile = open('ashu.txt', 'w')

for line in f:
    result = time.match(line)
    if result:
        # If current line contains time format dump only time
        print(result.group())
        outfile.write(result.group() + '\n')
    else:
        print(line)
        # Split line containing hex dump and tokenize into list elements.
        result = all.split(line)
        if result:
            i = 0
            for values in result:
                if i == 2:
                    # Strip off additional spaces in hex dump
                    # Useful when hex dump does not end in 16 bytes boundary.
                    val = twoSpaces.sub('', values)

                    # Tokenize individual elements separated by single space.
                    byteResult = singleSpace.split(val)
                    for twoByte in byteResult:
                        # Identify individual byte
                        singleByte = singleBytePattern.split(twoByte)
                        byteOffset = 0
                        for oneByte in singleByte:
                            if byteOffset == 1 or byteOffset == 3:
                                # Write out individual byte with a space char appended
                                print(oneByte, end=' ')
                                outfile.write(oneByte + ' ')
                            byteOffset += 1
                elif i == 3:
                    # Write of char format of hex dump
                    print("  " + values, end='')
                    outfile.write('  ' + values + ' ')
                elif i == 4:
                    outfile.write(values)
                else:
                    print(values, end=' ')
                    outfile.write(values + ' ')
                i += 1
        else:
            print("could not split")
f.close()
outfile.close()

原文

I've written a Python script to convert the textual output of tcpdump -i eth0 -neXXs0 into a format understandable by text2pcap. This is my first Python program, and I'm looking for suggestions to enhance its efficiency, readability, or any potential discrepancies in the code.

The tcpdump output format I'm working with looks like this:

20:11:32.001190 00:16:76:7f:2b:b1 > 00:11:5c:78:ca:c0, ethertype IPv4 (0x0800), length 72: 123.236.188.140.41756 > 94.59.34.210.45931: UDP, length 30
    
    0x0000:  0011 5c78 cac0 0016 767f 2bb1 0800 4500  ..\x....v.+...E.
    0x0010:  003a 0000 4000 4011 812d 7bec bc8c 5e3b  .:..@[email protected]{...^;
    0x0020:  22d2 a31c b36b 0026 b9bd 2033 6890 ad33  "....k.&...3h..3
    0x0030:  e845 4b8d 2ba1 0685 0cb3 70dd 9b98 76d8  .EK.+.....p...v.
    0x0040:  8fc6 8293 bf33 325a                      .....32Z

Output

Format understandable by text2pcap:

20:11:32.001190 

    0000: 00 11 5c 78 ca c0 00 16 76 7f 2b b1 08 00 45 00   ..\x....v.+...E. 
    0010: 00 3a 00 00 40 00 40 11 81 2d 7b ec bc 8c 5e 3b   .:..@[email protected]{...^; 
    0020: 22 d2 a3 1c b3 6b 00 26 b9 bd 20 33 68 90 ad 33   "....k.&...3h..3
    0030: e8 45 4b 8d 2b a1 06 85 0c b3 70 dd 9b 98 76 d8   .EK.+.....p...v. 
    0040: 8f c6 82 93 bf 33 32 5a   .....32Z

Following is my Code:

import re

# Identify time of the current packet.
time = re.compile('(..:..:..\.[\w]*) ')
# Get individual elements from the packet. ie. offset, hexdump, chars
all = re.compile('[ |\t]+0x([\w]+:) +(.+)  +(.*)')
# Regex for two spaces
twoSpaces = re.compile('  +')
# Regex for single space
singleSpace = re.compile(' ')
# Single byte pattern.
singleBytePattern = re.compile(r'([\w][\w])')

# Open files.
f = open('pcap.txt', 'r')
outfile = open('ashu.txt', 'w')

for line in f:
    result = time.match(line)
    if result:
        # If current line contains time format dump only time
        print(result.group())
        outfile.write(result.group() + '\n')
    else:
        print(line)
        # Split line containing hex dump and tokenize into list elements.
        result = all.split(line)
        if result:
            i = 0
            for values in result:
                if i == 2:
                    # Strip off additional spaces in hex dump
                    # Useful when hex dump does not end in 16 bytes boundary.
                    val = twoSpaces.sub('', values)

                    # Tokenize individual elements separated by single space.
                    byteResult = singleSpace.split(val)
                    for twoByte in byteResult:
                        # Identify individual byte
                        singleByte = singleBytePattern.split(twoByte)
                        byteOffset = 0
                        for oneByte in singleByte:
                            if byteOffset == 1 or byteOffset == 3:
                                # Write out individual byte with a space char appended
                                print(oneByte, end=' ')
                                outfile.write(oneByte + ' ')
                            byteOffset += 1
                elif i == 3:
                    # Write of char format of hex dump
                    print("  " + values, end='')
                    outfile.write('  ' + values + ' ')
                elif i == 4:
                    outfile.write(values)
                else:
                    print(values, end=' ')
                    outfile.write(values + ' ')
                i += 1
        else:
            print("could not split")
f.close()
outfile.close()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

好菇凉咱不稀罕他 2024-10-04 04:27:17

使用 tcpdump 的 -w 选项写入 pcap 格式文件，

tcpdump -w filename.pcap

Wireshark 应该能够读取它。

Use the -w option of tcpdump to write to a pcap format file

tcpdump -w filename.pcap

Wireshark should be able to read it.

回复收藏 0 原文

乖乖 2024-10-04 04:27:17

我做了一个等效的 powershell。 text2pcap.exe 接受它，但我确实收到大部分“不一致的偏移。期望 0，得到 10。忽略数据包的其余部分”警告。 Wireshark 打开但看起来不正确。我将检查我的 tcpdump 操作数和 text2pcap 操作数，看看是否可以让它看起来更好。

提供下面的代码以防对某人有帮助。

$text.split(10)|forEach{ if($_ -notmatch"0x"){$_} else { $num = [regex]::match($_,"(?<=0x)\d.*:").value ; $hex = [regex]::matches($_," \w.+").value.trim().replace(" ","") |%{$_ -split ("([a-z0-9]{2})")}; [string]$num,[string]$hex -join " "} }

2023-03-20 13:20:04.309607 IP 192.168.0.2.443 > 192.168.0.10.56321: Flags [.], ack 11801, win 498, length 0
0000:  45  00  00  28  3d  e9  40  00  ff  06  00  00  c0  a8  0c  57 E..([email protected]
0010:  0a  fc  16  ba  01  bb  dc  01  38  29  25  31  51  97  cd  b6 ........8)% 1Q ...
0020:  50  10  01  f2  00  00  00  00 P.......

I made a powershell equivalent. text2pcap.exe accepts it but I do get mostly "Inconsistent offset. Expecting 0, got 10. Ignoring rest of packet" warnings. Wireshark opens but doesn't look right. I'm going to check my tcpdump operands and text2pcap operands to see if I can get it to look better.

Providing code below in case it helps someone.

$text.split(10)|forEach{ if($_ -notmatch"0x"){$_} else { $num = [regex]::match($_,"(?<=0x)\d.*:").value ; $hex = [regex]::matches($_," \w.+").value.trim().replace(" ","") |%{$_ -split ("([a-z0-9]{2})")}; [string]$num,[string]$hex -join " "} }

2023-03-20 13:20:04.309607 IP 192.168.0.2.443 > 192.168.0.10.56321: Flags [.], ack 11801, win 498, length 0
0000:  45  00  00  28  3d  e9  40  00  ff  06  00  00  c0  a8  0c  57 E..([email protected]
0010:  0a  fc  16  ba  01  bb  dc  01  38  29  25  31  51  97  cd  b6 ........8)% 1Q ...
0020:  50  10  01  f2  00  00  00  00 P.......

回复收藏 0 原文

~没有更多了~