Shell:按子字符串对字符串进行分组的脚本

发布于 2024-11-09 19:25:50 字数 1195 浏览 0 评论 0原文

我有一个程序(抱歉,无法更改此选项)正在输出超过 500k 行的日志文件。

我正在尝试根据行中的子字符串将日志文件中的行分组在一起(然后对这些组进行排序)

例如,我有类似于下面的行:

SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;

我希望分组的是 TIM BETWEEN '* ' AND '*' 其中 * 在行之间匹配,例如:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

将在输出中进行分组:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

每个组也已根据整个字符串进行排序,因此“某些内容”相似的地方位于旁边彼此?

我一直在尝试将 shell 脚本放在一起来输出我想要从日志文件中读取的内容,但没有取得任何成功!

编辑:我还需要提到“某事”可以是多个单词,例如:

SELECT blah1, blah2 or SELECT blah1, blah2, blah3

I have a program (sorry changing this is not an option) that is outputting log files with upwards of 500k lines.

I am trying to group together lines (and then sort these groups) in the log file based on a substring with in the lines

For example I have lines similar to below:

SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;

what im looking to group on is the TIM BETWEEN '*' AND '*' where * matches between lines for example:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

would be grouped as such in the output:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

with each group also having been sorted based on the whole string so where the "somethings" are similar the are next to each other?

I have been trying to put a shell script together to output what i want reading from a log file but haven't had any success!

Edit: I need to also mention that 'something' can be multiple words for example:

SELECT blah1, blah2 or SELECT blah1, blah2, blah3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

静若繁花 2024-11-16 19:25:50

您应该能够使用 sort

sort -o outputfile +1 -2 +4 -5 +6 -7 inputfile

其中 +1 -2 给出“某事”列,+4 -5 给出第一个日期列,+6 -7 给出最后一个日期列。

(PS!未测试)

You should probably be able to use sort

sort -o outputfile +1 -2 +4 -5 +6 -7 inputfile

Where +1 -2 gives the "something" column, +4 -5 gives the first date column and +6 -7 gives the last date column.

(PS! Not tested)

风吹雪碎 2024-11-16 19:25:50

您必须预先过滤数据并将其转换为可以使用sort使用的内容。

awk '{sub(/BETWEEN/, "|",$0) ;sub(/AND/,"|",$0)}' logFile \
| sort -t"|" +1 -2 +2 -3 \
| sed 's/|/BETWEEN/;s/|/AND/'

输出

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

我希望这有帮助。

You'll have to pre-filter your data and turn it into something you can use sort with.

awk '{sub(/BETWEEN/, "|",$0) ;sub(/AND/,"|",$0)}' logFile \
| sort -t"|" +1 -2 +2 -3 \
| sed 's/|/BETWEEN/;s/|/AND/'

output

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

I hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文