从日志中提取特定模式

发布于 2024-10-02 01:04:12 字数 1115 浏览 4 评论 0原文

我需要从如下所示的日志文件中提取请求:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<vehicleRegistration>
.... XML in between ....
.... XML in between ....
.... XML in between ....
.... XML in between ....
... at nth line there is line like this <vehicle id="2312313"></vehicle>
.... XML in between ....
.... XML in between ....
</vehicleRegistration>

重要的问题是,vehicleRegistration 可以是 5 行,有时是 17 行,它是可变的。这是我当前的 grep 失败的地方,我使用:

grep -A 13 "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>" vehicle.log

另外一个问题是,有时一个请求可以发送 2 次或更多次,因为服务可能由于某种原因不可用,因此文件中可能有相同的多个请求。

我还应该排除重复的请求,通过比较第 n 行(不是最后一行)来知道请求是否重复的方法,如果车辆 ID 重复,则其重复。

你会用什么方法解决这个问题?建议、代码、伪代码,任何内容都欢迎。

编辑:

日志文件不是一个xml文件,它只是一个包含一小部分xml请求的文件,我无法将其解析为XML

编辑II:

我只提取车辆登记部分,使用 @eugene y 一行命令 perl -nle 'm{} .. m{} 并打印' logfile ,我怎样才能摆脱对于重复项,那些具有相同车辆 ID 的节点,我只想保留其中的一份副本。

I need to extract requests from a log file that look like this :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<vehicleRegistration>
.... XML in between ....
.... XML in between ....
.... XML in between ....
.... XML in between ....
... at nth line there is line like this <vehicle id="2312313"></vehicle>
.... XML in between ....
.... XML in between ....
</vehicleRegistration>

The important issue is that vehicleRegistration can be 5 lines and sometimes 17, its changeable. It is where my current grep has failed, I used :

grep -A 13 "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>" vehicle.log

Also another issue is that, sometimes a request can be sent 2 or more times because the service might be unavailable for some reason, so there might be same multiple requests in the file.

I should also rule out duplicate requests, the way to know that the request is duplicate by comparing nth line(not the last line) <vehicle id="2312313"></vehicle>, if vehicle id repeated than its a duplicate.

What is the way you would solve this? Suggestions, code, pseudo-code, anything is welcome.

EDIT :

Log file is not an xml file, its just a file containing some small percentage of xml requests and I can't parse it as XML

EDIT II :

I extracted only the vehicle registration part, using @eugene y one line command perl -nle 'm{<vehicleRegistration>} .. m{</vehicleRegistration>} and print' logfile , how can I get rid of duplicates, those nodes that have same vehicle id, I want to keep only one copy of those.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

浅浅 2024-10-09 01:04:13

使用 XPath 恢复 XML 元素节点。有许多适用于各种现代脚本语言的框架。

使用 Perl,您可能会执行以下操作:

#!/usr/bin/perl

use strict;
use warnings;
use XML::XPath;

my $file = 'vehicleRegistration.xml';
my $xp = XML::XPath->new(filename => $file);

print "Vehicle id: ".$xp->find('//vehicle/@id')."\n";

如果需要,解析日志文件以提取 XML 文档部分,然后对其运行 XPath 表达式以恢复所需的元素和数据。

Use XPath to recover XML element nodes. There are lots of frameworks for various modern scripting languages.

With Perl, you might do something like:

#!/usr/bin/perl

use strict;
use warnings;
use XML::XPath;

my $file = 'vehicleRegistration.xml';
my $xp = XML::XPath->new(filename => $file);

print "Vehicle id: ".$xp->find('//vehicle/@id')."\n";

If you need to, parse your log file to extract the XML document portion, and then run the XPath expression on it to recover the element and data you want.

故事灯 2024-10-09 01:04:13

使用 XPath (并且根据您想要对结果执行的操作,可能 Xslt)

有用于此目的的命令行实用程序,这里,例如

Use XPath (and depending on what you want to do with the result, possibly Xslt)

There are command line utilities for this, here, for example

囚你心 2024-10-09 01:04:12

我会使用 XML::Simple (或其他 XML 解析器)来提取数据。 Data::Dumper 可用于检查数据结构。

更新:您可以像这样提取 vehicleRegistration 元素:

open my $fh, '<', 'logfile' or die $!;     
my $xml = ""; 

while (<$fh>) {
    if ( m{<vehicleRegistration>} .. m{</vehicleRegistration>}) {
        $xml .= $_; 
    }   
}

或者使用 perl 单行代码:

perl -nle 'm{<vehicleRegistration>} .. m{</vehicleRegistration>} and print' logfile

I'd use XML::Simple (or other XML parser) to extract the data. Data::Dumper can be used to inspect data structures.

Update: you can extract the vehicleRegistration elements like this:

open my $fh, '<', 'logfile' or die $!;     
my $xml = ""; 

while (<$fh>) {
    if ( m{<vehicleRegistration>} .. m{</vehicleRegistration>}) {
        $xml .= $_; 
    }   
}

Or with a perl one-liner:

perl -nle 'm{<vehicleRegistration>} .. m{</vehicleRegistration>} and print' logfile
触ぅ动初心 2024-10-09 01:04:12

在unix中使用awk或gawk命令来提取注册...

#!/usr/bin/awk -f 

/^<vehicleRegistration>/ { printit="true" } # set the print flag on
printit ~ "true" { print }                  # if printflag set print
/^</vehicleRegistration>{ printit="false" } # turn print flag off

use the awk or gawk command in unix to extract the registration...

#!/usr/bin/awk -f 

/^<vehicleRegistration>/ { printit="true" } # set the print flag on
printit ~ "true" { print }                  # if printflag set print
/^</vehicleRegistration>{ printit="false" } # turn print flag off
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文