Shell:按子字符串对字符串进行分组的脚本
我有一个程序(抱歉,无法更改此选项)正在输出超过 500k 行的日志文件。
我正在尝试根据行中的子字符串将日志文件中的行分组在一起(然后对这些组进行排序)
例如,我有类似于下面的行:
SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;
我希望分组的是 TIM BETWEEN '* ' AND '*'
其中 * 在行之间匹配,例如:
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
将在输出中进行分组:
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
每个组也已根据整个字符串进行排序,因此“某些内容”相似的地方位于旁边彼此?
我一直在尝试将 shell 脚本放在一起来输出我想要从日志文件中读取的内容,但没有取得任何成功!
编辑:我还需要提到“某事”可以是多个单词,例如:
SELECT blah1, blah2 or SELECT blah1, blah2, blah3
I have a program (sorry changing this is not an option) that is outputting log files with upwards of 500k lines.
I am trying to group together lines (and then sort these groups) in the log file based on a substring with in the lines
For example I have lines similar to below:
SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;
what im looking to group on is the TIM BETWEEN '*' AND '*'
where * matches between lines for example:
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
would be grouped as such in the output:
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
with each group also having been sorted based on the whole string so where the "somethings" are similar the are next to each other?
I have been trying to put a shell script together to output what i want reading from a log file but haven't had any success!
Edit: I need to also mention that 'something' can be multiple words for example:
SELECT blah1, blah2 or SELECT blah1, blah2, blah3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该能够使用 sort
其中 +1 -2 给出“某事”列,+4 -5 给出第一个日期列,+6 -7 给出最后一个日期列。
(PS!未测试)
You should probably be able to use sort
Where +1 -2 gives the "something" column, +4 -5 gives the first date column and +6 -7 gives the last date column.
(PS! Not tested)
您必须预先过滤数据并将其转换为可以使用
sort
使用的内容。输出
我希望这有帮助。
You'll have to pre-filter your data and turn it into something you can use
sort
with.output
I hope this helps.