PHP/Python 中服务时间的多行日志解析

发布于 2024-10-03 19:11:13 字数 467 浏览 5 评论 0原文

解析需要 php 和/或 python 中前几行的上下文知识的多行日志文件的最佳方法是什么?

前任。

 Date    Time    ID    Call

1/1/10 00:00:00 1234 Start
1/1/10 00:00:01 1234 ServiceCall A Starts
1/1/10 00:00:05 1234 ServiceCall B Starts
1/1/10 00:00:06 1234 ServiceCall A Finishes
1/1/10 00:00:09 1234 ServiceCall B Finishes
1/1/10 00:00:10 1234 Stop

每个日志行都有一个唯一的 ID 将其绑定到会话,但不保证每个连续的行集都来自同一会话。

最终目标是找出每笔交易花费了多长时间以及每笔子交易花费了多长时间。

如果已经存在一个图书馆,我很想使用它。

What's the best way to parse a multi line log file that require contextual knowledge from previous lines in php and/or python?

ex.

 Date    Time    ID    Call

1/1/10 00:00:00 1234 Start
1/1/10 00:00:01 1234 ServiceCall A Starts
1/1/10 00:00:05 1234 ServiceCall B Starts
1/1/10 00:00:06 1234 ServiceCall A Finishes
1/1/10 00:00:09 1234 ServiceCall B Finishes
1/1/10 00:00:10 1234 Stop

Each log line will have a unique id to bind it to a session but each consecutive set of lines is not guaranteed to be from the same session.

The ultimate goal is to find out how long each transaction took and how long each sub transaction took.

I'd love to use a library if one already exists.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

在风中等你 2024-10-10 19:11:13

我可以想到两种不同的方法来做到这一点。

1)您可以使用有限状态机逐行处理文件。当您到达开始线时,标记时间。当您到达具有相同 ID 的 Stop 线时,比较时间和报告。

2) 使用 PHP 的 Perl 兼容正则表达式 和 m 修饰符来匹配所有每个开始/结束行集中的文本,然后只需查看返回的每个匹配字符串的第一行和最后一行。

在这两种情况下,我都会验证 ID 是否匹配,以防止匹配不同的集合。

I can think of two different ways of doing this.

1) You can use a finite state machine to process the file line by line. When you hit a Start line, mark the time. When you hit a Stop line with the same ID, diff the time and report.

2) Use PHP's Perl-Compatible Regular Expressions with the m modifier to match all the text from each start/stop line set, then just look at the first and last lines of each match string returned.

In both cases, I would verify the IDs match to prevent against matching different sets.

遗失的美好 2024-10-10 19:11:13

我的第一个想法是每次我的解析器遇到带有新键的启动模式时创建对象。我假设,从你的例子中 1234 是一个键,这样所有必须关联在一起的日志行都可以映射到一个“事物”(对象)的状态。

因此,您会看到开始跟踪其中之一的模式,每次看到与之相关的日志条目时,您都会调用这些后续行所代表的事件类型(状态更改)的方法。

从您的示例中,这些“日志状态”对象(由于缺乏更合适的术语)可能包含每个 ServiceCall 的列表或字典(或其他容器)(我希望这将是另一类对象)。

因此,总体设计将是一个读取日志的解析器/调度程序,如果日志项与某个现有对象(键)相关,则该项目将被调度到该对象
然后可以进一步创建自己的(ServiceCall 或其他)对象和/或向这些对象分派事件或引发异常或调用回调或根据需要调用其他函数。

想必您还需要一些收集或最终处置处理程序,当将停止事件分派给日志对象时,它们可以由日志对象调用。

我猜您还想支持某种排序或状态报告方法,以便应用程序可以枚举所有活动(未收集)对象以响应其他通道中的信号或命令(可能来自由解析器/调度器)

My first thought would be to create objects each time my parser encountered the start pattern with a new key. I'm assuming,from your example that 1234 is a key such that all log lines which must be correlated together can be mapped to the state of one "thing" (object).

So you see the pattern to start tracking one of these and every time you see a log entry that relates to it you call methods for the type of event (state change) that these subsequent lines represent.

From your example these "log state" objects (for lack of a more apropos term) might contain a list or dictionary (or other container) for each ServiceCall (which I would expect would be another class of objects).

So the overall design would be a parser/dispatcher that reads the log, if the log item relates to some existing object (key) then the item is dispatched to the object which
can then further create its own (ServiceCall or other) objects and/or dispatch events to those or raise exceptions or invoke callbacks or call outs to other functions as needed.

Presumably you also will need to have some collection or final disposition handler which could be called by your log objects when the Stop events are dispatched to them.

I'd guess you'd also want to support some sort or status reporting method so that the application can enumerate all live (uncollected) objects to in response to signals or commands in some other channel (perhaps from a non-blocking check performed by the parser/dispatcher)

冰魂雪魄 2024-10-10 19:11:13

这是我不久前编写的日志解析器的变体,专为您的日志格式而定制。 (一般方法与 Jim Dennis 的描述非常接近,尽管我使用了列表的默认字典来累积任何给定会话的所有条目。)

from pyparsing import Suppress,Word,nums,restOfLine
from datetime import datetime
from collections import defaultdict

def convertToDateTime(tokens):
    month,day,year,hh,mm,ss = tokens
    return datetime(year+2000, month, day, hh,mm,ss)

# define building blocks for parsing and processing log file entries
SLASH,COLON = map(Suppress,"/:")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
date = integer + (SLASH + integer)*2
time = integer + (COLON + integer)*2
timestamp = date + time
timestamp.setParseAction(convertToDateTime)

# define format of a single line in the log file
logEntry = timestamp("timestamp") + integer("sessionid") + restOfLine("descr")

# summarize calls into single data structure
calls = defaultdict(list)
for logline in log:
    entry = logEntry.parseString(logline)
    calls[entry.sessionid].append(entry)

# first pass to find start/end time for each call
for sessionid in sorted(calls):
    calldata = calls[sessionid]
    print sessionid, calldata[-1].timestamp - calldata[0].timestamp

对于您的数据,这将打印出:

1234 0:00:10

您可以使用类似的方法处理每个会话的条目列表梳理子交易。

Here is a variation on a log parser I wrote a while ago, tailored to your log format. (The general approach tracks pretty closely with Jim Dennis's description, although I used a defaultdict of lists to accumulate all the entries for any given session.)

from pyparsing import Suppress,Word,nums,restOfLine
from datetime import datetime
from collections import defaultdict

def convertToDateTime(tokens):
    month,day,year,hh,mm,ss = tokens
    return datetime(year+2000, month, day, hh,mm,ss)

# define building blocks for parsing and processing log file entries
SLASH,COLON = map(Suppress,"/:")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
date = integer + (SLASH + integer)*2
time = integer + (COLON + integer)*2
timestamp = date + time
timestamp.setParseAction(convertToDateTime)

# define format of a single line in the log file
logEntry = timestamp("timestamp") + integer("sessionid") + restOfLine("descr")

# summarize calls into single data structure
calls = defaultdict(list)
for logline in log:
    entry = logEntry.parseString(logline)
    calls[entry.sessionid].append(entry)

# first pass to find start/end time for each call
for sessionid in sorted(calls):
    calldata = calls[sessionid]
    print sessionid, calldata[-1].timestamp - calldata[0].timestamp

For your data, this prints out:

1234 0:00:10

You can process each session's list of entries with a similar approach to tease apart the sub-transactions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文