Python-如何从文件中找到配置文件

发布于 2025-02-13 12:49:20 字数 2141 浏览 1 评论 0原文

我是Python的新手。

我想从日志文件中查找 profiles ，并且使用以下标准

用户登录，用户更改密码，用户在同一秒内登录了
这些操作（登录，更改密码，更改密码，登录）一个接一个地发生，没有其他介于两者之间。

使用.txt文件看起来像这样的

Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|iukj| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|klij| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|yytr| - |user logged in| -

ASDF-是典型的 profile 在日志文件中名称

这是我到目前为止所做的，

import collections
import time

with open('logfiles.txt') as infile:
    counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
    print(line, count)
    
time.sleep(10)

我知道逻辑是要获得相同的时间，分钟和秒如果它们是重复的，那么我会打印配置文件。但是我很困惑如何从文件中获取时间。

任何帮助都非常感谢。

编辑：

The output would be:
asdf
klij
plnb
zzad

原文

I am new to Python.

I wanted to find profiles from a log file, with following criteria

user logged in, user changed password, user logged off within same second
those actions (log in, change password, log off) happened one after another with no other entires in between.

with .txt file looks like this

Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|iukj| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|klij| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|yytr| - |user logged in| -

asdf - is typical profile name from the log file

Here is what I have done so far

import collections
import time

with open('logfiles.txt') as infile:
    counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
    print(line, count)
    
time.sleep(10)

I know the logic is to get same hours, minutes, and seconds
if they are duplicates, then I print the profiles.
But I am confuse how to get time from a file.

Any help is very much appreciated.

EDIT:

The output would be:
asdf
klij
plnb
zzad

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

得不到的就毁灭 2025-02-20 12:49:20

我认为这比您想象的要复杂。您的示例数据非常简单，但是描述（要求）暗示日志可能需要考虑到您需要考虑的线条。因此，我认为这是通过日志文件依次记录某些操作（登录，登录）并记下任何以前行上观察到的内容的情况。这似乎与您的数据合作：

from datetime import datetime as DT, timedelta as TD

FMT = '%a, %d %b %Y %H:%M:%S %z'
td = TD(seconds=1)
prev = None

with open('logfile.txt') as logfile:
    for line in logfile:
        if len(tokens := line.split('|')) > 4:
            dt, _, profile, _, action, *_ = tokens
            if prev is None or prev[1] != profile:
                prev = (dt, profile) if action == 'user logged in' else None
            else:
                if action == 'user logged off':
                    if DT.strptime(dt, FMT) - DT.strptime(prev[0], FMT) <= td:
                        print(profile)
                    prev = None

输出：

asdf
plnb
qweq
zzad

I think this is more complicated than you might have imagined. Your sample data is very straightforward but the description (requirements) imply that the log might have interspersed lines that you need to account for. So I think it's a case of working through the log file sequentially recording certain actions (log on, log off) and keeping a note of what was observed on any previous line. This seems to work with your data:

from datetime import datetime as DT, timedelta as TD

FMT = '%a, %d %b %Y %H:%M:%S %z'
td = TD(seconds=1)
prev = None

with open('logfile.txt') as logfile:
    for line in logfile:
        if len(tokens := line.split('|')) > 4:
            dt, _, profile, _, action, *_ = tokens
            if prev is None or prev[1] != profile:
                prev = (dt, profile) if action == 'user logged in' else None
            else:
                if action == 'user logged off':
                    if DT.strptime(dt, FMT) - DT.strptime(prev[0], FMT) <= td:
                        print(profile)
                    prev = None

Output:

asdf
plnb
qweq
zzad

回复收藏 0 原文

安人多梦 2025-02-20 12:49:20

为了解释时间，我将使用正则表达式来完成此任务，以匹配每行的时间表达式。

这样的事情会起作用。

编辑：我省略了与格式不符的行。

import re

time = re.search(r'(\d+):(\d+):(\d+)', line).group()

就配置文件的名称而言，我将在最常见的行中使用拆分函数，例如建议的@matthias，您的代码看起来像这样：

import collections
import time

with open('logfiles.txt') as infile:
    counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
    """The line splits where the '|' symbol is and creates a list.
       We choose the third element of the list - profile"""
    list_of_segments = line.split('|')
    if len(list_of_segments) == 6:
       print(list_of_segments[2])
    
time.sleep(10)

To parse a time I would use regex for this task to match a time expression on each line.

Something like this would work.

EDIT: I omitted the lines which don't correspond to the formatting.

import re

time = re.search(r'(\d+):(\d+):(\d+)', line).group()

As far as the profile name is concerned, I would use a split function on the most common lines like @Matthias suggested and your code would look something like this:

import collections
import time

with open('logfiles.txt') as infile:
    counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
    """The line splits where the '|' symbol is and creates a list.
       We choose the third element of the list - profile"""
    list_of_segments = line.split('|')
    if len(list_of_segments) == 6:
       print(list_of_segments[2])
    
time.sleep(10)

回复收藏 0 原文

~没有更多了~