正则表达式组合了 2 行完整的调用数据

发布于 2025-01-13 22:29:20 字数 1534 浏览 1 评论 0原文

我正在准备一个 python 脚本，它可以根据用户输入的电话号码从 .txt 日志文件中获取通话记录。

我需要帮助编写一个捕获每个呼叫日志的正则表达式，以便可以将其“拆分”到列表中。每个日志都以字母开头，例如（L、N、S、E）。每条日志的开头都有一个以“T”开头的时间戳。示例：T 02/25 00:00

以 (N,S,E) 开头的记录有需要包含的第二行。或者，(L) 记录没有，但它们确实有一个空白空间，下面可以包含该空白空间。

本质上，我需要的是以 (L,N,S,E) 开头的每条记录及其下面的行。

请参阅下面的通话日志示例

<预><代码>T 02/25 00:00 长 065 00 24329 12313 244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000 N 066 00 23442 T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630 & 0000 0000 S 067 00 00984 T000134 02/25 08:06 00:00:02 A 61445 & 0000 0000 S 068 00 T000002 29536 02/25 08:05 00:00:36 & 0000 0000 1234567890XXXXXX E 069 00 T000002 T000185 02/25 08:06 00:00:00 & 0000 0000 1234567890XXXXXX

例如：

<预><代码>L 065 00 24329 12313 244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000

将是一场比赛并且

<预><代码>N 066 00 23442 T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630 & 0000 0000

将是另一个，

<前><代码>S 068 00 T000002 29536 02/25 08:05 00:00:36 & 0000 0000 1234567890XXXXXX

将是另一个，依此类推...

我的正则表达式知识有限，但这是我到目前为止所想到的。

N(.*)\n(.*)

这将选择以“N”开头的每个记录的第一行和第二行，但会将它们放入不同的组中。任何方向都值得赞赏。

原文

I am preparing a python script that simply grabs call records from a .txt log file based on a user inputted phone number.

I need help writing a regex that captures each call log so it can be "split" into a list.
Each log starts with the letters e.g. (L,N,S,E).
At the begging of each log that is a time stamp that starts with a "T". Example: T 02/25 00:00

Records that start with (N,S,E) have a second line that need to be include. Alternatively, (L) records do not, but they do have an empty space below which could be included.

Essentially, what I need is every record that starts with (L,N,S,E) and the line bellow it.

See call log sample bellow

T                        02/25 00:00 
L 065 00 24329   12313   244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000 
                                                                
N 066 00 23442   T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630
&       0000    0000                                      
S 067 00 00984   T000134             02/25 08:06 00:00:02 A 61445
&       0000    0000                                      
S 068 00 T000002 29536               02/25 08:05 00:00:36 
&       0000    0000   1234567890XXXXXX                   
E 069 00 T000002 T000185             02/25 08:06 00:00:00 
&       0000    0000   1234567890XXXXXX

For example:

L 065 00 24329   12313   244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000

would be one match and

N 066 00 23442   T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630
&       0000    0000

would be another,

S 068 00 T000002 29536               02/25 08:05 00:00:36 
&       0000    0000   1234567890XXXXXX

would be another, and so on...

My regex knowledge is limited, but this is what I have come up with so far.

N(.*)\n(.*)

This selects the first and second line of each record that starts with "N", but it puts them in separate groups. Any direction is appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

假情假意假温柔 2025-01-20 22:29:20

要匹配这 4 个字符中的任何一个，您可以使用字符类 [LNSE] 后跟空格和该行的其余部分。

然后，在同一捕获组中，使用可选部分来匹配换行符和该行的其余部分（如果该行不以任何这些字符开头）。

这也允许只匹配单行。

^[LNSE] (.*(?:\n(?![LNSE] ).*)?)

^ 字符串开头
[LNSE] 匹配任何列出的字符和空格
( 捕获组 1
- .* 匹配该行的其余部分
- (?: 非捕获组
  - \n(?![LNSE] ).* 如果换行符不以任何列出的字符开头，则使用否定先行匹配换行符和该行的其余部分
- )? 关闭非捕获组并将其设为可选
) 关闭非捕获组

请参阅正则表达式演示和 Python 演示。

如果您想匹配 0 或更多以下行，您可以更改? 到 * 零次或多次重复。
如果您想匹配包括前导字符在内的整行，您可以省略捕获组并仅获取比赛。

例如，在读取整个文件时使用 re.findall 获取捕获组 1 值：

import re

with open('file.log', 'r') as file:
    regex = r"^[LNSE] (.*(?:\n(?![LNSE] ).*)*)"
    print(re.findall(regex, file.read(), re.MULTILINE))

To match any of those 4 characters, you can use a character class [LNSE] followed by a space and the rest of the line.

Then in the same capture group use an optional part to match a newline and the rest of the line if it does not start with any of those characters.

This allows for also matching only a single line.

^[LNSE] (.*(?:\n(?![LNSE] ).*)?)

^ Start of string
[LNSE] Match any of the listed characters and a space
( Capture group 1
- .* match the rest of the line
- (?: Non capture group
  - \n(?![LNSE] ).* Match a newline and the rest of the line if it does not start with any of the listed characters using a negative lookahead
- )? Close the non capture group and make it optional
) Close the non capture group

See a regex demo and a Python demo.

If you want to match 0 or more following lines, you can change the ? to * for zero or more repetitions.
If you want to match the whole line including the leading characters, you can omit the capture group and just get the match.

For example, getting the capture group 1 value using re.findall while reading the whole file:

import re

with open('file.log', 'r') as file:
    regex = r"^[LNSE] (.*(?:\n(?![LNSE] ).*)*)"
    print(re.findall(regex, file.read(), re.MULTILINE))

回复收藏 0 原文

还给你自由 2025-01-20 22:29:20

这是另一种不使用正则表达式的方法：

data = []
with open('file.log', 'r') as f:
    next(f)
    for line in f:
        if line[0] in ("L", "N", "S", "E"):
            data.append(line)
        else:
            data[-1] += line
print(data)

next(f) 用于跳过第一行 (T 02/25 00:00)。

假设所有日志都是两行长，还可以使用列表理解：

with open('file.log', 'r') as f:
    next(f)
    lines = f.readlines()
    data = [''.join(lines[i:i+2]) for i in range(0, len(lines), 2)]
print(data)

每一行都存储在列表 lines 中（第一行除外，已被跳过）。然后列表推导式将列表的元素两两连接起来。

如果日志文件存储在变量中，则相同：

lines = text.split("\n")[1:]  # [-1:] to skip the first line
data = [''.join(lines[i:i+2]) for i in range(0, len(lines), 2)]
print(data)

输出：

[
    'L 065 00 24329   12313   244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000\n                                                                \n', 
    'N 066 00 23442   T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630\n&       0000    0000                                      \n', 
    'S 067 00 00984   T000134             02/25 08:06 00:00:02 A 61445\n&       0000    0000                                      \n', 
    'S 068 00 T000002 29536               02/25 08:05 00:00:36 \n&       0000    0000   1234567890XXXXXX                   \n', 
    'E 069 00 T000002 T000185             02/25 08:06 00:00:00 \n&       0000    0000   1234567890XXXXXX'
]

Here is another way to do so without regex:

data = []
with open('file.log', 'r') as f:
    next(f)
    for line in f:
        if line[0] in ("L", "N", "S", "E"):
            data.append(line)
        else:
            data[-1] += line
print(data)

next(f) is used to skip the first line (T 02/25 00:00).

Assuming that all logs are two lines long, one can also use a list comprehension:

with open('file.log', 'r') as f:
    next(f)
    lines = f.readlines()
    data = [''.join(lines[i:i+2]) for i in range(0, len(lines), 2)]
print(data)

Each line is stored in a list lines (except the first line, which has been skipped). The list comprehension then joins elements of the list two by two.

The same if the log file is stored in a variable:

lines = text.split("\n")[1:]  # [-1:] to skip the first line
data = [''.join(lines[i:i+2]) for i in range(0, len(lines), 2)]
print(data)

Output:

[
    'L 065 00 24329   12313   244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000\n                                                                \n', 
    'N 066 00 23442   T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630\n&       0000    0000                                      \n', 
    'S 067 00 00984   T000134             02/25 08:06 00:00:02 A 61445\n&       0000    0000                                      \n', 
    'S 068 00 T000002 29536               02/25 08:05 00:00:36 \n&       0000    0000   1234567890XXXXXX                   \n', 
    'E 069 00 T000002 T000185             02/25 08:06 00:00:00 \n&       0000    0000   1234567890XXXXXX'
]

回复收藏 0 原文

~没有更多了~