PHP/Python 中服务时间的多行日志解析

发布于 2024-10-03 19:11:13 字数 467 浏览 5 评论 0原文

解析需要 php 和/或 python 中前几行的上下文知识的多行日志文件的最佳方法是什么？

前任。

 Date    Time    ID    Call

1/1/10 00:00:00 1234 Start
1/1/10 00:00:01 1234 ServiceCall A Starts
1/1/10 00:00:05 1234 ServiceCall B Starts
1/1/10 00:00:06 1234 ServiceCall A Finishes
1/1/10 00:00:09 1234 ServiceCall B Finishes
1/1/10 00:00:10 1234 Stop

每个日志行都有一个唯一的 ID 将其绑定到会话，但不保证每个连续的行集都来自同一会话。

最终目标是找出每笔交易花费了多长时间以及每笔子交易花费了多长时间。

如果已经存在一个图书馆，我很想使用它。

原文

What's the best way to parse a multi line log file that require contextual knowledge from previous lines in php and/or python?

ex.

 Date    Time    ID    Call

1/1/10 00:00:00 1234 Start
1/1/10 00:00:01 1234 ServiceCall A Starts
1/1/10 00:00:05 1234 ServiceCall B Starts
1/1/10 00:00:06 1234 ServiceCall A Finishes
1/1/10 00:00:09 1234 ServiceCall B Finishes
1/1/10 00:00:10 1234 Stop

Each log line will have a unique id to bind it to a session but each consecutive set of lines is not guaranteed to be from the same session.

The ultimate goal is to find out how long each transaction took and how long each sub transaction took.

I'd love to use a library if one already exists.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

在风中等你 2024-10-10 19:11:13

我可以想到两种不同的方法来做到这一点。

1）您可以使用有限状态机逐行处理文件。当您到达开始线时，标记时间。当您到达具有相同 ID 的 Stop 线时，比较时间和报告。

2) 使用 PHP 的 Perl 兼容正则表达式和 m 修饰符来匹配所有每个开始/结束行集中的文本，然后只需查看返回的每个匹配字符串的第一行和最后一行。

在这两种情况下，我都会验证 ID 是否匹配，以防止匹配不同的集合。

回复收藏 0 原文

遗失的美好 2024-10-10 19:11:13

我的第一个想法是每次我的解析器遇到带有新键的启动模式时创建对象。我假设，从你的例子中 1234 是一个键，这样所有必须关联在一起的日志行都可以映射到一个“事物”（对象）的状态。

因此，您会看到开始跟踪其中之一的模式，每次看到与之相关的日志条目时，您都会调用这些后续行所代表的事件类型（状态更改）的方法。

从您的示例中，这些“日志状态”对象（由于缺乏更合适的术语）可能包含每个 ServiceCall 的列表或字典（或其他容器）（我希望这将是另一类对象）。

因此，总体设计将是一个读取日志的解析器/调度程序，如果日志项与某个现有对象（键）相关，则该项目将被调度到该对象
然后可以进一步创建自己的（ServiceCall 或其他）对象和/或向这些对象分派事件或引发异常或调用回调或根据需要调用其他函数。

想必您还需要一些收集或最终处置处理程序，当将停止事件分派给日志对象时，它们可以由日志对象调用。

我猜您还想支持某种排序或状态报告方法，以便应用程序可以枚举所有活动（未收集）对象以响应其他通道中的信号或命令（可能来自由解析器/调度器）

回复收藏 0 原文

冰魂雪魄 2024-10-10 19:11:13

这是我不久前编写的日志解析器的变体，专为您的日志格式而定制。（一般方法与 Jim Dennis 的描述非常接近，尽管我使用了列表的默认字典来累积任何给定会话的所有条目。）

from pyparsing import Suppress,Word,nums,restOfLine
from datetime import datetime
from collections import defaultdict

def convertToDateTime(tokens):
    month,day,year,hh,mm,ss = tokens
    return datetime(year+2000, month, day, hh,mm,ss)

# define building blocks for parsing and processing log file entries
SLASH,COLON = map(Suppress,"/:")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
date = integer + (SLASH + integer)*2
time = integer + (COLON + integer)*2
timestamp = date + time
timestamp.setParseAction(convertToDateTime)

# define format of a single line in the log file
logEntry = timestamp("timestamp") + integer("sessionid") + restOfLine("descr")

# summarize calls into single data structure
calls = defaultdict(list)
for logline in log:
    entry = logEntry.parseString(logline)
    calls[entry.sessionid].append(entry)

# first pass to find start/end time for each call
for sessionid in sorted(calls):
    calldata = calls[sessionid]
    print sessionid, calldata[-1].timestamp - calldata[0].timestamp

对于您的数据，这将打印出：

1234 0:00:10

您可以使用类似的方法处理每个会话的条目列表梳理子交易。

Here is a variation on a log parser I wrote a while ago, tailored to your log format. (The general approach tracks pretty closely with Jim Dennis's description, although I used a defaultdict of lists to accumulate all the entries for any given session.)

from pyparsing import Suppress,Word,nums,restOfLine
from datetime import datetime
from collections import defaultdict

def convertToDateTime(tokens):
    month,day,year,hh,mm,ss = tokens
    return datetime(year+2000, month, day, hh,mm,ss)

# define building blocks for parsing and processing log file entries
SLASH,COLON = map(Suppress,"/:")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
date = integer + (SLASH + integer)*2
time = integer + (COLON + integer)*2
timestamp = date + time
timestamp.setParseAction(convertToDateTime)

# define format of a single line in the log file
logEntry = timestamp("timestamp") + integer("sessionid") + restOfLine("descr")

# summarize calls into single data structure
calls = defaultdict(list)
for logline in log:
    entry = logEntry.parseString(logline)
    calls[entry.sessionid].append(entry)

# first pass to find start/end time for each call
for sessionid in sorted(calls):
    calldata = calls[sessionid]
    print sessionid, calldata[-1].timestamp - calldata[0].timestamp

For your data, this prints out: