从日志中提取特定模式

发布于 2024-10-02 01:04:12 字数 1115 浏览 4 评论 0原文

我需要从如下所示的日志文件中提取请求：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<vehicleRegistration>
.... XML in between ....
.... XML in between ....
.... XML in between ....
.... XML in between ....
... at nth line there is line like this <vehicle id="2312313"></vehicle>
.... XML in between ....
.... XML in between ....
</vehicleRegistration>

重要的问题是，vehicleRegistration 可以是 5 行，有时是 17 行，它是可变的。这是我当前的 grep 失败的地方，我使用：

grep -A 13 "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>" vehicle.log

另外一个问题是，有时一个请求可以发送 2 次或更多次，因为服务可能由于某种原因不可用，因此文件中可能有相同的多个请求。

我还应该排除重复的请求，通过比较第 n 行（不是最后一行）来知道请求是否重复的方法，如果车辆 ID 重复，则其重复。

你会用什么方法解决这个问题？建议、代码、伪代码，任何内容都欢迎。

编辑：

日志文件不是一个xml文件，它只是一个包含一小部分xml请求的文件，我无法将其解析为XML

编辑II：

我只提取车辆登记部分，使用 @eugene y 一行命令 perl -nle 'm{} .. m{} 并打印' logfile ，我怎样才能摆脱对于重复项，那些具有相同车辆 ID 的节点，我只想保留其中的一份副本。

原文

I need to extract requests from a log file that look like this :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<vehicleRegistration>
.... XML in between ....
.... XML in between ....
.... XML in between ....
.... XML in between ....
... at nth line there is line like this <vehicle id="2312313"></vehicle>
.... XML in between ....
.... XML in between ....
</vehicleRegistration>

The important issue is that vehicleRegistration can be 5 lines and sometimes 17, its changeable. It is where my current grep has failed, I used :

grep -A 13 "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>" vehicle.log

Also another issue is that, sometimes a request can be sent 2 or more times because the service might be unavailable for some reason, so there might be same multiple requests in the file.

I should also rule out duplicate requests, the way to know that the request is duplicate by comparing nth line(not the last line) <vehicle id="2312313"></vehicle>, if vehicle id repeated than its a duplicate.

What is the way you would solve this? Suggestions, code, pseudo-code, anything is welcome.

EDIT :

Log file is not an xml file, its just a file containing some small percentage of xml requests and I can't parse it as XML

EDIT II :

I extracted only the vehicle registration part, using @eugene y one line command perl -nle 'm{<vehicleRegistration>} .. m{</vehicleRegistration>} and print' logfile , how can I get rid of duplicates, those nodes that have same vehicle id, I want to keep only one copy of those.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅浅 2024-10-09 01:04:13

使用 XPath 恢复 XML 元素节点。有许多适用于各种现代脚本语言的框架。

使用 Perl，您可能会执行以下操作：

#!/usr/bin/perl

use strict;
use warnings;
use XML::XPath;

my $file = 'vehicleRegistration.xml';
my $xp = XML::XPath->new(filename => $file);

print "Vehicle id: ".$xp->find('//vehicle/@id')."\n";

如果需要，解析日志文件以提取 XML 文档部分，然后对其运行 XPath 表达式以恢复所需的元素和数据。

Use XPath to recover XML element nodes. There are lots of frameworks for various modern scripting languages.

With Perl, you might do something like:

#!/usr/bin/perl

use strict;
use warnings;
use XML::XPath;

my $file = 'vehicleRegistration.xml';
my $xp = XML::XPath->new(filename => $file);

print "Vehicle id: ".$xp->find('//vehicle/@id')."\n";

If you need to, parse your log file to extract the XML document portion, and then run the XPath expression on it to recover the element and data you want.

回复收藏 0 原文

故事灯 2024-10-09 01:04:13

使用 XPath （并且根据您想要对结果执行的操作，可能 Xslt)

有用于此目的的命令行实用程序，这里，例如

回复收藏 0 原文

囚你心 2024-10-09 01:04:12

我会使用 XML::Simple （或其他 XML 解析器）来提取数据。 Data::Dumper 可用于检查数据结构。

更新：您可以像这样提取 vehicleRegistration 元素：

open my $fh, '<', 'logfile' or die $!;     
my $xml = ""; 

while (<$fh>) {
    if ( m{<vehicleRegistration>} .. m{</vehicleRegistration>}) {
        $xml .= $_; 
    }   
}

或者使用 perl 单行代码：

perl -nle 'm{<vehicleRegistration>} .. m{</vehicleRegistration>} and print' logfile

I'd use XML::Simple (or other XML parser) to extract the data. Data::Dumper can be used to inspect data structures.

Update: you can extract the vehicleRegistration elements like this:

open my $fh, '<', 'logfile' or die $!;     
my $xml = ""; 

while (<$fh>) {
    if ( m{<vehicleRegistration>} .. m{</vehicleRegistration>}) {
        $xml .= $_; 
    }   
}

Or with a perl one-liner:

perl -nle 'm{<vehicleRegistration>} .. m{</vehicleRegistration>} and print' logfile

回复收藏 0 原文

触ぅ动初心 2024-10-09 01:04:12

在unix中使用awk或gawk命令来提取注册...

#!/usr/bin/awk -f 

/^<vehicleRegistration>/ { printit="true" } # set the print flag on
printit ~ "true" { print }                  # if printflag set print
/^</vehicleRegistration>{ printit="false" } # turn print flag off

use the awk or gawk command in unix to extract the registration...

#!/usr/bin/awk -f 

/^<vehicleRegistration>/ { printit="true" } # set the print flag on
printit ~ "true" { print }                  # if printflag set print
/^</vehicleRegistration>{ printit="false" } # turn print flag off

回复收藏 0 原文

~没有更多了~