python分析2个日志文件

发布于 2024-09-13 16:05:25 字数 1291 浏览 14 评论 0原文

我有 2 个大日志文件。我想查看设备是否在 a 中而不是 b 中，反之亦然（排除设备常见的行），文件看起来像这个示例。

_{04/09/2010,13:11:52,Authen OK,user1,默认组,00-24-2B-A1-08-88,29,10.1.1.1,(默认),,,,,, 13、EAP-TLS,,设备1,
04/19/2010,15:35:24,验证成功,user2,默认组,00-24-2B-A1-05-EA,29,10.1.1.2,(默认),,,,,,13,EAP -TLS,,设备2,
04/09/2010,13:11:52,验证成功,user3,默认组,00-24-2B-A1-08-88,29,10.1.1.3,(默认),,,,,,13,EAP -TLS,,device3,
04/19/2010,15:35:24,验证成功,user4,默认组,00-24-2B-A1-05-EA,29,10.1.1.4,(默认),,,,,,13,EAP -TLS,,device4,}

重申一下，我需要日志文件 a 中但不是 b 中、b 中但不是 a 中的每个设备的设备（字段 [-2]）和 IP（字段 [7]）

这是我到目前为止所做的，但看起来有点笨拙并且非常慢（每个文件大约有 400K 行）。我交叉引用了两次。有人可以建议一下效率吗？也许我使用了错误的逻辑？

chst={}
chbs={}
for i,line in enumerate(open('chst.txt').readlines()):
    line=line.split(',')
    chst[line[-2]+','+str(i)]=','.join(line)

for i,line in enumerate(open('chbs.txt').readlines()):
    line=line.split(',')
    chbs[line[-2]+','+str(i)]='.'.join(line)

print "these lines are in CHST but not in CHBS"
for a in chst:
    if a.split(',')[0] not in str(chbs.values()):
        line=chst[a].split(',')
        print line[-2], line[7]

print "\nthese lines are in CHBS but not in CHST"

for a in chbs:
    if a.split(',')[0] not in str(chst.values()):
        line=chbs[a].split(',')
        print line[-2], line[7]

原文

I have 2 large logfiles. I want to see if a device is in a but not b and vice versa (exclude lines where the device is common) the files look like this example.

_{04/09/2010,13:11:52,Authen OK,user1,Default Group,00-24-2B-A1-08-88,29,10.1.1.1,(Default),,,,,,13,EAP-TLS,,device1,
04/19/2010,15:35:24,Authen OK,user2,Default Group,00-24-2B-A1-05-EA,29,10.1.1.2,(Default),,,,,,13,EAP-TLS,,device2,
04/09/2010,13:11:52,Authen OK,user3,Default Group,00-24-2B-A1-08-88,29,10.1.1.3,(Default),,,,,,13,EAP-TLS,,device3,
04/19/2010,15:35:24,Authen OK,user4,Default Group,00-24-2B-A1-05-EA,29,10.1.1.4,(Default),,,,,,13,EAP-TLS,,device4,}

to reiterate, I need device (field [-2]) and IP (field [7]) for each device that is in logfile a but not b, and is in b but not a

Here's what I've done so far, but seems a little clunky and is very slow (each file has about 400K lines). I'm cross referring twice. Can anyone suggest efficiencies please? Perhaps I am using the wrong logic??

chst={}
chbs={}
for i,line in enumerate(open('chst.txt').readlines()):
    line=line.split(',')
    chst[line[-2]+','+str(i)]=','.join(line)

for i,line in enumerate(open('chbs.txt').readlines()):
    line=line.split(',')
    chbs[line[-2]+','+str(i)]='.'.join(line)

print "these lines are in CHST but not in CHBS"
for a in chst:
    if a.split(',')[0] not in str(chbs.values()):
        line=chst[a].split(',')
        print line[-2], line[7]

print "\nthese lines are in CHBS but not in CHST"

for a in chbs:
    if a.split(',')[0] not in str(chst.values()):
        line=chbs[a].split(',')
        print line[-2], line[7]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

两个我 2024-09-20 16:05:26

您正在寻找对称差异 ：

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }

diff = chst ^ chbs

如果您需要不对称差异，请使用 -：

chst - chbs # tuples in chst but not in chbs
chbs - chst # tuples in chbs but not in chst

如果您需要实际行，而不是元组 ( device, IP )，您可以使用字典而不是集合：

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }

diff = chst.items( ) ^ bar.items( )

这之所以有效，是因为 dict.items( ) 返回一个查看具有类似集合属性的项目。请注意，这在 Python 2.x 中称为 dict.viewitems( ) 。

You are looking for a symmetric difference:

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ) for line in open( ... ) }

diff = chst ^ chbs

If you need the asymmetric differences, use -:

chst - chbs # tuples in chst but not in chbs
chbs - chst # tuples in chbs but not in chst

If you need the actual line, instead of a tuple ( device, IP ) you can use dictionaries instead of sets:

chst = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }
chbs = { ( line.split( "," )[ -2 ], line.split( "," )[ 7 ] ): line for line in open( ... ) }

diff = chst.items( ) ^ bar.items( )

This works because dict.items( ) returns a view on the items, which has setlike properties. Note that this is called dict.viewitems( ) in Python 2.x.

回复收藏 0 原文