Unix grep 查询

发布于 2024-12-08 02:54:49 字数 1125 浏览 0 评论 0原文

[2011-09-23 18:46:51:697 GMT+00:00][17B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=1
[2011-09-24 19:46:53:697 GMT+00:00][47B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=12
[2011-09-25 20:46:51:697 GMT+00:00][57B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedin #mouseclicked# userid=23
[2011-09-25 20:46:51:697 GMT+00:00][57B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] DEBUG mouseclicked by userid=566
[2011-09-25 20:56:56:697 GMT+00:00][77B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedin #mouseclicked# userid=44
[2011-09-26 22:48:55:697 GMT+00:00][87B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=55

在上面的文件中,我想知道从 2011 年 9 月 24 日到 2011 年 9 月 25 日(包括这两个日期)期间,#mouseclicked# 发生了多少次。

在上述情况下,该命令应返回 3 (注意:不考虑 mouseclicked,因为它与 #mouseclicked# 不匹配)

在这种情况下如何使用 grep 命令?

[2011-09-23 18:46:51:697 GMT+00:00][17B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=1
[2011-09-24 19:46:53:697 GMT+00:00][47B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=12
[2011-09-25 20:46:51:697 GMT+00:00][57B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedin #mouseclicked# userid=23
[2011-09-25 20:46:51:697 GMT+00:00][57B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] DEBUG mouseclicked by userid=566
[2011-09-25 20:56:56:697 GMT+00:00][77B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedin #mouseclicked# userid=44
[2011-09-26 22:48:55:697 GMT+00:00][87B020C421B4BCC2CEBAD9C1B77CA413.http-8080-6][com.abc.actions.RegisterAction] INFO loggedOut #mouseclicked# userid=55

In above file, I want to know how many times #mouseclicked# occured for the date ranging from 24-Sep-11 to 25-Sep-11 (both dates inclusive).

In above case, the command should return me 3 (Note: mouseclicked is not considered as it is not matching with #mouseclicked#)

How can I use grep command in this case?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

高速公鹿 2024-12-15 02:54:49

单独使用 grep 并不能解决一般问题。它无法识别特定日期范围内的行。 (好吧,如果您使用足够复杂的正则表达式,则可能可以,但是对于您感兴趣的每个日期范围,正则表达式将完全不同。)

但是对于您的具体问题,这将有效:

egrep -c '^\[2011-09-(24|25).*#mouseclicked#' filename

egrep 支持更强大的正则表达式形式,包括 | 运算符。 -c 选项告诉它打印匹配行的数量,而不是打印行本身。

但正如您可以想象的那样,如果您想要从 9 月 30 日下午 1 点到 10 月 2 日上午 11 点之间的行,则正则表达式将会复杂得多,并且需要花费大量精力来构建它。

如果我要经常这样做,我会编写一个单独的工具,利用此文件中使用的特定日期格式(YYYY-MM-)从指定的日期范围(或日期和时间)中提取行。 DD HH:MM:SS,ISO-8601,是一个很好的选择)。就我个人而言,我会用 Perl 编写这样一个工具。然后我可以在文件上运行该工具并通过 grep 管道输出。

编辑

作为对评论的回应,grep不理解日期范围,只理解字符序列。您可以编写一个复杂的正则表达式来匹配 2010 年 10 月 1 日到 2011 年 12 月 1 日范围内的所有内容。这是我的尝试(未经测试):

egrep -c '^\[(2010-1.*|2011-(0.|10|11)|2011-12-01).*#mouseclicked#' filename

这涉及几个单独的子范围:2010年10月到12月,1月到9月,然后是10月,然后是2011年11月,最后是2011年12月1日。

并且,正如我上面所说,对于任何其他日期范围(或者更糟糕的是日期和时间),您需要构建一个全新的复杂正则表达式,根据其文本表示而不是其含义来匹配所需时间跨度的子范围日期。

这就是为什么如果我想这样做不止一两次,我就不会考虑这种方法。

您知道 Perl 或 Python 等脚本语言吗?如果是这样,编写一个实际解析时间戳并选择所需范围内的行的脚本就不会太困难。

事实上,如果这样的工具已经存在,我一点也不会感到惊讶(我只是不知道在哪里可以找到它)。

编辑2

这是我编写的一个Perl脚本:

#!/usr/bin/perl

use strict;
use warnings;

die "Usage: $0 start end [file...]\n" if scalar @ARGV < 2;
my $start = shift;
my $end = shift;
$start =~ s/\D//g;
$end   =~ s/\D//g;
$end .= '99999999999999999999999999999';

print "start=\"$start\", end=\"$end\"\n";

while (<>) {
    if (/^\[([^]]+)\]/) {
        my $timestamp = $1;
        $timestamp =~ s/\D//g;
        if ($timestamp ge $start and $timestamp le $end) {
            print;
        }
    }
}

它将指定的开始和结束时间以及文件中的时间戳视为数字序列,并对它们进行字符串(不是数字)比较。它忽略时区信息。使用 CPAN 的时间和日期模块之一可以使其变得更加复杂。

对于你原来的问题,你会运行:

this-perl-script 2011-09-24 2011-09-25 input-file | grep -c '#mouseclicked#'

grep alone won't solve the general problem. It can't recognize lines that are within a certain range of dates. (Well, it probably can if you use a sufficiently complex regular expression, but the regexp will be quite different for each range of dates you're interested in.)

But for your specific question, this will work:

egrep -c '^\[2011-09-(24|25).*#mouseclicked#' filename

egrep supports a more powerful form of regular expressions, including the | operator. The -c option tells it to print the number of matching lines rather than printing the lines themselves.

But as you can imagine, if you want lines from 1pm on September 30 to 11am on October 2, the regular expression is going to be a lot more complex, and it will take some significant effort to construct it.

If I were going to be doing this a lot, I'd write a separate tool that extracts lines from a specified range of dates (or dates and times), taking advantage of the particular date format used in this file (YYYY-MM-DD HH:MM:SS, ISO-8601, is an excellent choice). Personally, I'd write such a tool in Perl. I could then run the tool on the file and pipe the output through grep.

EDIT:

In response to the comment, grep doesn't understand date ranges, just character sequences. You can write a complex regular expression that would match everything in the range 1-oct-2010 to 1-dec-2011. Here's my attempt (not tested):

egrep -c '^\[(2010-1.*|2011-(0.|10|11)|2011-12-01).*#mouseclicked#' filename

This deals with several individual subranges: October through December of 2010, January through September, then October, then November of 2011, and finally December 1 of 2011.

And, as I said above, for any other range of dates (or, worse, dates and times), you'll need to construct an entirely new complicated regular expression that matches subranges of the desired time span, based on their textual representation, not on their meanings as dates.

That's why I wouldn't consider this kind of approach if I wanted to do this more than once or twice.

Do you know a scripting language like Perl or Python? If so, it wouldn't be too difficult to write a script that will actually parse the timestamps and select lines that are within the desired range.

In fact, I wouldn't be at all surprised if such a tool already exists (I just don't know where to find it).

EDIT 2:

Here's a Perl script I threw together:

#!/usr/bin/perl

use strict;
use warnings;

die "Usage: $0 start end [file...]\n" if scalar @ARGV < 2;
my $start = shift;
my $end = shift;
$start =~ s/\D//g;
$end   =~ s/\D//g;
$end .= '99999999999999999999999999999';

print "start=\"$start\", end=\"$end\"\n";

while (<>) {
    if (/^\[([^]]+)\]/) {
        my $timestamp = $1;
        $timestamp =~ s/\D//g;
        if ($timestamp ge $start and $timestamp le $end) {
            print;
        }
    }
}

It treats the specified start and end times, as well as the timestamps in the file, as digit sequences and does a stringwise (not numeric) comparison on them. It ignores the timezone information. It could be made a lot more sophisticated with one of the time and date modules from CPAN.

For your original question, you'd run:

this-perl-script 2011-09-24 2011-09-25 input-file | grep -c '#mouseclicked#'
不乱于心 2024-12-15 02:54:49
cat filename | grep '^\[2011-09-2[45]' | grep mouseclicked | wc -l 

或者,更简单地说:

grep '^\[2011-09-2[45]' filename | grep -c mouseclicked
cat filename | grep '^\[2011-09-2[45]' | grep mouseclicked | wc -l 

Or, more simply:

grep '^\[2011-09-2[45]' filename | grep -c mouseclicked
握住我的手 2024-12-15 02:54:49

我会尝试类似的东西
查询 | wc-l

Grep 将过滤包含您的字符串的 like,而 wc -l 将计算 grep 输出的行数。

I would try something like
grep | wc-l

Grep will filter the likes that contain your string while wc -l will count the number of lines that are outputed by grep.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文