python分析2个日志文件

发布于 2024-09-13 16:05:25 字数 1291 浏览 6 评论 0原文

我有 2 个大日志文件。我想查看设备是否在 a 中而不是 b 中,反之亦然(排除设备常见的行),文件看起来像这个示例。

04/09/2010,13:11:52,Authen OK,user1,默认组,00-24-2B-A1-08-88,29,10.1.1.1,(默认),,,,,, 13、EAP-TLS,,设备1,
04/19/2010,15:35:24,验证成功,user2,默认组,00-24-2B-A1-05-EA,29,10.1.1.2,(默认),,,,,,13,EAP -TLS,,设备2,
04/09/2010,13:11:52,验证成功,user3,默认组,00-24-2B-A1-08-88,29,10.1.1.3,(默认),,,,,,13,EAP -TLS,,device3,
04/19/2010,15:35:24,验证成功,user4,默认组,00-24-2B-A1-05-EA,29,10.1.1.4,(默认),,,,,,13,EAP -TLS,,device4,

重申一下,我需要日志文件 a 中但不是 b 中、b 中但不是 a 中的每个设备的设备(字段 [-2])和 IP(字段 [7])

这是我到目前为止所做的,但看起来有点笨拙并且非常慢(每个文件大约有 400K 行)。我交叉引用了两次。有人可以建议一下效率吗?也许我使用了错误的逻辑?

chst={}
chbs={}
for i,line in enumerate(open('chst.txt').readlines()):
    line=line.split(',')
    chst[line[-2]+','+str(i)]=','.join(line)

for i,line in enumerate(open('chbs.txt').readlines()):
    line=line.split(',')
    chbs[line[-2]+','+str(i)]='.'.join(line)

print "these lines are in CHST but not in CHBS"
for a in chst:
    if a.split(',')[0] not in str(chbs.values()):
        line=chst[a].split(',')
        print line[-2], line[7]

print "\nthese lines are in CHBS but not in CHST"

for a in chbs:
    if a.split(',')[0] not in str(chst.values()):
        line=chbs[a].split(',')
        print line[-2], line[7]

I have 2 large logfiles. I want to see if a device is in a but not b and vice versa (exclude lines where the device is common) the files look like this example.

04/09/2010,13:11:52,Authen OK,user1,Default Group,00-24-2B-A1-08-88,29,10.1.1.1,(Default),,,,,,13,EAP-TLS,,device1,
04/19/2010,15:35:24,Authen OK,user2,Default Group,00-24-2B-A1-05-EA,29,10.1.1.2,(Default),,,,,,13,EAP-TLS,,device2,
04/09/2010,13:11:52,Authen OK,user3,Default Group,00-24-2B-A1-08-88,29,10.1.1.3,(Default),,,,,,13,EAP-TLS,,device3,
04/19/2010,15:35:24,Authen OK,user4,Default Group,00-24-2B-A1-05-EA,29,10.1.1.4,(Default),,,,,,13,EAP-TLS,,device4,

to reiterate, I need device (field [-2]) and IP (field [7]) for each device that is in logfile a but not b, and is in b but not a

Here's what I've done so far, but seems a little clunky and is very slow (each file has about 400K lines). I'm cross referring twice. Can anyone suggest efficiencies please? Perhaps I am using the wrong logic??

chst={}
chbs={}
for i,line in enumerate(open('chst.txt').readlines()):
    line=line.split(',')
    chst[line[-2]+','+str(i)]=','.join(line)

for i,line in enumerate(open('chbs.txt').readlines()):
    line=line.split(',')
    chbs[line[-2]+','+str(i)]='.'.join(line)

print "these lines are in CHST but not in CHBS"
for a in chst:
    if a.split(',')[0] not in str(chbs.values()):
        line=chst[a].split(',')
        print line[-2], line[7]

print "\nthese lines are in CHBS but not in CHST"

for a in chbs:
    if a.split(',')[0] not in str(chst.values()):
        line=chbs[a].split(',')
        print line[-2], line[7]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

两个我 2024-09-20 16:05:26

您正在寻找对称差异

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }

diff = chst ^ chbs

如果您需要不对称差异,请使用 -

chst - chbs # tuples in chst but not in chbs
chbs - chst # tuples in chbs but not in chst

如果您需要实际行,而不是元组 ( device, IP ),您可以使用字典而不是集合:

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }

diff = chst.items( ) ^ bar.items( )

这之所以有效,是因为 dict.items( ) 返回一个 查看具有类似集合属性的项目。请注意,这在 Python 2.x 中称为 dict.viewitems( ) 。

You are looking for a symmetric difference:

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }

diff = chst ^ chbs

If you need the asymmetric differences, use -:

chst - chbs # tuples in chst but not in chbs
chbs - chst # tuples in chbs but not in chst

If you need the actual line, instead of a tuple ( device, IP ) you can use dictionaries instead of sets:

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }

diff = chst.items( ) ^ bar.items( )

This works because dict.items( ) returns a view on the items, which has setlike properties. Note that this is called dict.viewitems( ) in Python 2.x.

寒尘 2024-09-20 16:05:26

第 9 行有一个错误,您正在执行 ='.'.join(line) 而不是 =','.join(line) ,即引号中的点而不是逗号。或者也许 chbs 中的行应该稍后用点而不是逗号来分割。

目前,如果 device7 is in chbs 但不在 chst 中有三行,脚本会告诉您三次,但您对问题的描述意味着您不需要知道它出现了多少次。您真的想要这样吗?或者单个报告可以多次出现吗?在这种情况下,您可以通过仅使用设备名称作为字典键并检查其他字典是否具有该键来简化它。

另外,此时您正在记录行号,但并未真正使用它们。如果您确实需要知道设备出现的次数,为什么不报告而不是计数呢?在这种情况下,当将设备密钥添加到字典时,首先检查它是否已经存在,如果是,则增加一个计数器(也许在另一个字典中也由设备名称键入)。

There's a bug in line 9 where you are doing ='.'.join(line) instead of =','.join(line) i.e. a dot in the quotes instead of a comma. Or maybe the lines in chbs should be split on dots instead of commas later.

At the moment if there are three lines for device7 is in chbs but not chst the script will tell you three times, but your description of the problem implies that you don't need to know how many times it appears. Do you really want that or is a single report OK for multiple occurrences? In that case you could simplify it by just using the device name as the dictionary key and checking if the other dictionary has that key.

Also at the moment you're recording the line numbers, but not really using them. If you do need to know how many times a device appears why not report that instead of having to count them? In which case when adding a device key to the dictionary first check if it's already there and if so increment a counter (perhaps in another dictionary also keyed by the device name).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文