正则表达式组合了 2 行完整的调用数据
我正在准备一个 python 脚本,它可以根据用户输入的电话号码从 .txt 日志文件中获取通话记录。
我需要帮助编写一个捕获每个呼叫日志的正则表达式,以便可以将其“拆分”到列表中。 每个日志都以字母开头,例如(L、N、S、E)。 每条日志的开头都有一个以“T”开头的时间戳。示例:T 02/25 00:00
以 (N,S,E) 开头的记录有需要包含的第二行。或者,(L) 记录没有,但它们确实有一个空白空间,下面可以包含该空白空间。
本质上,我需要的是以 (L,N,S,E) 开头的每条记录及其下面的行。
请参阅下面的通话日志示例
<预><代码>T 02/25 00:00 长 065 00 24329 12313 244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000 N 066 00 23442 T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630 & 0000 0000 S 067 00 00984 T000134 02/25 08:06 00:00:02 A 61445 & 0000 0000 S 068 00 T000002 29536 02/25 08:05 00:00:36 & 0000 0000 1234567890XXXXXX E 069 00 T000002 T000185 02/25 08:06 00:00:00 & 0000 0000 1234567890XXXXXX
例如:
<预><代码>L 065 00 24329 12313 244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000
将是一场比赛并且
<预><代码>N 066 00 23442 T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630 & 0000 0000
将是另一个,
<前><代码>S 068 00 T000002 29536 02/25 08:05 00:00:36 & 0000 0000 1234567890XXXXXX
将是另一个,依此类推...
我的正则表达式知识有限,但这是我到目前为止所想到的。
N(.*)\n(.*)
这将选择以“N”开头的每个记录的第一行和第二行,但会将它们放入不同的组中。任何方向都值得赞赏。
I am preparing a python script that simply grabs call records from a .txt log file based on a user inputted phone number.
I need help writing a regex that captures each call log so it can be "split" into a list.
Each log starts with the letters e.g. (L,N,S,E).
At the begging of each log that is a time stamp that starts with a "T". Example: T 02/25 00:00
Records that start with (N,S,E) have a second line that need to be include. Alternatively, (L) records do not, but they do have an empty space below which could be included.
Essentially, what I need is every record that starts with (L,N,S,E) and the line bellow it.
See call log sample bellow
T 02/25 00:00 L 065 00 24329 12313 244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000 N 066 00 23442 T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630 & 0000 0000 S 067 00 00984 T000134 02/25 08:06 00:00:02 A 61445 & 0000 0000 S 068 00 T000002 29536 02/25 08:05 00:00:36 & 0000 0000 1234567890XXXXXX E 069 00 T000002 T000185 02/25 08:06 00:00:00 & 0000 0000 1234567890XXXXXX
For example:
L 065 00 24329 12313 244.0.15.55 252.9.11.90 02/25 08:05 00:00:44 0000 0000
would be one match and
N 066 00 23442 T000185 262.1.00.09 02/25 08:05 00:00:02 A 16630 & 0000 0000
would be another,
S 068 00 T000002 29536 02/25 08:05 00:00:36 & 0000 0000 1234567890XXXXXX
would be another, and so on...
My regex knowledge is limited, but this is what I have come up with so far.
N(.*)\n(.*)
This selects the first and second line of each record that starts with "N", but it puts them in separate groups. Any direction is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
要匹配这 4 个字符中的任何一个,您可以使用字符类
[LNSE]
后跟空格和该行的其余部分。然后,在同一捕获组中,使用可选部分来匹配换行符和该行的其余部分(如果该行不以任何这些字符开头)。
这也允许只匹配单行。
^
字符串开头[LNSE]
匹配任何列出的字符和空格(
捕获组 1.*
匹配该行的其余部分(?:
非捕获组\n(?![LNSE] ).*
如果换行符不以任何列出的字符开头,则使用否定先行匹配换行符和该行的其余部分)?
关闭非捕获组并将其设为可选)
关闭非捕获组请参阅 正则表达式演示 和 Python 演示。
如果您想匹配 0 或更多以下行,您可以更改
?
到*
零次或多次重复。如果您想匹配包括前导字符在内的整行,您可以省略捕获组并仅获取 比赛。
例如,在读取整个文件时使用 re.findall 获取捕获组 1 值:
To match any of those 4 characters, you can use a character class
[LNSE]
followed by a space and the rest of the line.Then in the same capture group use an optional part to match a newline and the rest of the line if it does not start with any of those characters.
This allows for also matching only a single line.
^
Start of string[LNSE]
Match any of the listed characters and a space(
Capture group 1.*
match the rest of the line(?:
Non capture group\n(?![LNSE] ).*
Match a newline and the rest of the line if it does not start with any of the listed characters using a negative lookahead)?
Close the non capture group and make it optional)
Close the non capture groupSee a regex demo and a Python demo.
If you want to match 0 or more following lines, you can change the
?
to*
for zero or more repetitions.If you want to match the whole line including the leading characters, you can omit the capture group and just get the match.
For example, getting the capture group 1 value using re.findall while reading the whole file:
这是另一种不使用正则表达式的方法:
next(f)
用于跳过第一行 (T 02/25 00:00
)。假设所有日志都是两行长,还可以使用列表理解:
每一行都存储在列表
lines
中(第一行除外,已被跳过)。然后列表推导式将列表的元素两两连接起来。如果日志文件存储在变量中,则相同:
输出:
Here is another way to do so without regex:
next(f)
is used to skip the first line (T 02/25 00:00
).Assuming that all logs are two lines long, one can also use a list comprehension:
Each line is stored in a list
lines
(except the first line, which has been skipped). The list comprehension then joins elements of the list two by two.The same if the log file is stored in a variable:
Output: