如何使用 sed 或 awk 从字符串中提取多个参数

发布于 2024-12-03 19:07:20 字数 621 浏览 1 评论 0原文

我有一个如下所示的日志文件：

2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts

我想使用 gnuplot 绘制日期时间字符串与有趣的值。为了做到这一点，我试图将上面的日志文件解析为一个 csv 文件，该文件看起来像（并非日志中的所有行都有可绘制的值）：

2010/01/12/ 12:00, 45

2010/01/ 13/ 14:00, 60

我如何使用 sed 或 awk 执行此操作？

我可以提取初始字符，例如：

cat partial.log | sed -e 's/^\(.\{17\}\).*/\1/'

但是如何提取最终值？

我一直在尝试这样做，但没有成功！

谢谢

原文

I have a log file which looks like this:

2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts

I'd like to plot the date time string vs interesting value using gnuplot. In order to do that i'm trying to parse the above log file into a csv file which looks like (not all lines in the log have a plottable vale):

2010/01/12/ 12:00, 45

2010/01/13/ 14:00, 60

How can i do this with sed or awk?

I can extract the initial characters something like:

cat partial.log | sed -e 's/^\(.\{17\}\).*/\1/'

but how can i extract the end values?

I've been trying to do this to no avail!

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

謌踐踏愛綪 2024-12-10 19:07:21

重击

#!/bin/bash

while read -r a b line
do
  [[ $line =~ ([0-9]+)pts$ ]] && echo "$a $b, ${BASH_REMATCH[1]}"
done < file

Bash

#!/bin/bash

while read -r a b line
do
  [[ $line =~ ([0-9]+)pts$ ]] && echo "$a $b, ${BASH_REMATCH[1]}"
done < file

回复收藏 0 原文

给我一枪 2024-12-10 19:07:21

尝试：

awk 'NF==12{sub(/pts/,"",$12);printf "%s %s, %s ", $1, $2, $12}' file

输入：

2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts

输出：

2010/01/12/ 12:00, 45 2010/01/13/ 09:00, 60

根据您的新要求进行更新：

命令：

awk 'NF==12{gsub(/\//,"-",$1)sub(/pts/,"",$12);printf "%s%s %s \n", $1, $2, $12}' file

输出：

2010-01-12-12:00 45 
2010-01-13-09:00 60

HTH Chris

try:

awk 'NF==12{sub(/pts/,"",$12);printf "%s %s, %s ", $1, $2, $12}' file

Input:

2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts

Output:

2010/01/12/ 12:00, 45 2010/01/13/ 09:00, 60

Updated for your new requirements:

Command:

awk 'NF==12{gsub(/\//,"-",$1)sub(/pts/,"",$12);printf "%s%s %s \n", $1, $2, $12}' file

Output:

2010-01-12-12:00 45 
2010-01-13-09:00 60

HTH Chris

回复收藏 0 原文

一曲琵琶半遮面シ 2024-12-10 19:07:21

这确实是可能的。例如，像这样的正则表达式：

sed -n 's!([0-9]{4}/[0-9]{2}/[0-9]{2}/ [0-9]{2}:[0-9]{2}).*([0-9]+)pts!\1, \2!p'

It is indeed possible. A regex such as this one, for instance:

sed -n 's!([0-9]{4}/[0-9]{2}/[0-9]{2}/ [0-9]{2}:[0-9]{2}).*([0-9]+)pts!\1, \2!p'

回复收藏 0 原文

念﹏祤嫣 2024-12-10 19:07:21

awk '/pts/{ gsub(/pts/,"",$12);print $1,$2", "$12}' yourFile

输出：

2010/01/12/ 12:00, 45
2010/01/13/ 09:00, 60

[更新：根据您的新要求]

如何将上面的内容修改为：
<前><代码>2010-01-12-12:00 45
2010-01-13-09:00 60

awk '/pts/{ gsub(/pts/,"",$12);a=$1$2OFS$12;gsub(/\//,"-",a);print a}' yourFile

上面的cmd会给你：

2010-01-12-12:00 45
2010-01-13-09:00 60

awk '/pts/{ gsub(/pts/,"",$12);print $1,$2", "$12}' yourFile

output:

2010/01/12/ 12:00, 45
2010/01/13/ 09:00, 60

[Update:based on your new requirement]

How can i modify the above to look like:
2010-01-12-12:00 45 
2010-01-13-09:00 60

awk '/pts/{ gsub(/pts/,"",$12);a=$1$2OFS$12;gsub(/\//,"-",a);print a}' yourFile

the cmd above will give you:

2010-01-12-12:00 45
2010-01-13-09:00 60

回复收藏 0 原文

红衣飘飘貌似仙 2024-12-10 19:07:21

sed 可以变得更具可读性：

nn='[0-9]+'
n6='[0-9]{6}'
n4='[0-9]{4}'
n2='[0-9]{2}'
rx="^($n4/$n2/$n2/ $n2:$n2) .+ $n6 .+ ($nn)pts$"

sed -nre "s|$rx|\1 \2|p" file

输出

2010/01/12/ 12:00 45
2010/01/13/ 09:00 60

sed can be made more readable:

nn='[0-9]+'
n6='[0-9]{6}'
n4='[0-9]{4}'
n2='[0-9]{2}'
rx="^($n4/$n2/$n2/ $n2:$n2) .+ $n6 .+ ($nn)pts$"

sed -nre "s|$rx|\1 \2|p" file

output

2010/01/12/ 12:00 45
2010/01/13/ 09:00 60

回复收藏 0 原文

我一向站在原地 2024-12-10 19:07:21

我会在两个管道阶段中执行此操作，首先是 awk，然后是 sed：

awk '$NF ~ /[[:digit:]]+pts/ { print $1, $2", "$NF }' | 
  sed 's/pts$//'

通过使用 $NF 而不是固定数字，您可以使用最终字段，而不管不相关的文本是什么样子以及有多少个它所占据的领域。

I'd do that in two pipeline stages, first awk then sed:

awk '$NF ~ /[[:digit:]]+pts/ { print $1, $2", "$NF }' | 
  sed 's/pts$//'

By using $NF instead of a fixed number, you work with the final field, regardless of what the unrelated text looks like and how many fields it occupies.

回复收藏 0 原文

紅太極 2024-12-10 19:07:20

虽然这是一个非常古老的问题，有很多答案，但是您可以不使用sed或awk等外部工具来完成它 （因此与平台无关）。您可以“简单”地使用 gnuplot 来完成此操作（即使使用 OP 问题当时的版本：gnuplot 4.4.0，2010 年 3 月）。

但是，从您的示例数据和描述来看，并不清楚感兴趣的值是否

严格位于第 12 列中，还是
始终位于最后< /em> 列或
可以在任何列中，但始终尾随 pts

对于所有 3 种情况，仅使用 gnuplot (因此是独立于平台的）解决方案。
假设列分隔符是空格。

ad 1. 最简单的解决方案：使用 u 1:12，gnuplot 将简单地忽略非数字和列值，例如 45pts 将被解释为 45< /代码>。

ad 2. 和 3. 如果您将最后一列提取为字符串，如果您想通过 real() 将非数字值转换为浮点数，gnuplot 将失败并停止。因此，您必须通过自己的函数 isNumber() 来测试列值是否至少以数字开头，从而可以通过 real() 进行转换。如果字符串不是数字，您可以将值设置为 1/0 或 NaN。然而，在早期的 gnuplot 版本中，线（点）图的线条将被中断。
而在较新的 gnuplot 版本 (>=4.6.0) 中，您可以将值设置为 NaN 并通过 set datafile Missing NaN 避免中断，但这在以下版本中不可用gnuplot 4.4。
此外，在 gnuplot 4.4 中，NaN 只是设置为 0.0 (GPVAL_NAN = 0.0)。
您可以使用下面也使用的“技巧”来解决此问题。

数据： SO7353702.dat

2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts
2010/01/15/ 09:00 some un related alapha 345678 62pts and nothing
2010/01/17/ 09:00 some un related alapha 345678 and nothing
2010/01/18/ 09:00 some un related alapha 345678 and the interesting value 70.5pts
2010/01/19/ 09:00 some un related alapha 345678 and the interesting value extra extra 64pts
2010/01/20/ 09:00 some un related alapha 345678 and the interesting value 0.66e2pts

脚本：（适用于 gnuplot>=4.4.0，2010 年 3 月）

### extract numbers without external tools
reset
FILE = "SO7353702.dat"

set xdata time
set timefmt "%Y/%m/%d/ %H:%M"
set format x "%b %d"
isNumber(s) = strstrt('+-.',s[1:1])>0 && strstrt('0123456789',s[2:2])>0 \
              || strstrt('0123456789',s[1:1])>0

# Version 1:
plot FILE u 1:12 w lp pt 7 ti "value in the 12th column"
pause -1

# Version 2:
set datafile separator "\t"
getLastValue(col) = (s=word(strcol(col),words(strcol(col))), \
                     isNumber(s) ? (t0=t1, real(s)) :  (y0))
plot t0=NaN FILE u (t1=timecolumn(1), y0=getLastValue(1), t0) : (y0) w lp pt 7 \
        ti "value in the last column"
pause -1

# Version 3:
getPts(s) = (c=strstrt(s,"pts"), c>0 ? (r=s[1:c-1], p=word(r,words(r)), isNumber(p) ? \
            (t0=t1, real(p)) : y0) : y0)
plot t0=NaN FILE u (t1=timecolumn(1),y0=getPts(strcol(1)),t0):(y0) w lp pt 7 \
            ti "value anywhere with trailing 'pts'"
### end of script

结果：

版本 1：

版本 2：

版本3：

Although this is a really old question with many answers, but you can do it without the use of external tools like sed or awk (hence platform-independent). You can "simply" do it with gnuplot (even with the version at that time of OP's question: gnuplot 4.4.0, March 2010).

However, from your example data and description it is not clear whether the value of interest

is strictly in the 12th column or
is always in the last column or
could be in any column but always trailed with pts

For all 3 cases there are gnuplot-only (hence platform-independent) solutions.
Assumption is that column separator is space.

ad 1. The simplest solution: with u 1:12, gnuplot will simply ignore non-numerical and column values, e.g. like 45pts will be interpreted as 45.

ad 2. and 3. If you extract the last column as string, gnuplot will fail and stop if you want to convert a non-numerical value via real() into a floating point number. Hence, you have to test yourself via your own function isNumber() if the column value at least starts with a number and hence can be converted by real(). In case the string is not a number you could set the value to 1/0 or NaN. However, in earlier gnuplot versions the line of a lines(points) plot will be interrupted.
Whereas in newer gnuplot versions (>=4.6.0) you could set the value to NaN and avoid interruptions via set datafile missing NaN which, however, is not available in gnuplot 4.4.
Furthermore, in gnuplot 4.4 NaN is simply set to 0.0 (GPVAL_NAN = 0.0).
You can workaround this with this "trick" which is also used below.

Data: SO7353702.dat

2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts
2010/01/15/ 09:00 some un related alapha 345678 62pts and nothing
2010/01/17/ 09:00 some un related alapha 345678 and nothing
2010/01/18/ 09:00 some un related alapha 345678 and the interesting value 70.5pts
2010/01/19/ 09:00 some un related alapha 345678 and the interesting value extra extra 64pts
2010/01/20/ 09:00 some un related alapha 345678 and the interesting value 0.66e2pts

Script: (works for gnuplot>=4.4.0, March 2010)

### extract numbers without external tools
reset
FILE = "SO7353702.dat"

set xdata time
set timefmt "%Y/%m/%d/ %H:%M"
set format x "%b %d"
isNumber(s) = strstrt('+-.',s[1:1])>0 && strstrt('0123456789',s[2:2])>0 \
              || strstrt('0123456789',s[1:1])>0

# Version 1:
plot FILE u 1:12 w lp pt 7 ti "value in the 12th column"
pause -1

# Version 2:
set datafile separator "\t"
getLastValue(col) = (s=word(strcol(col),words(strcol(col))), \
                     isNumber(s) ? (t0=t1, real(s)) :  (y0))
plot t0=NaN FILE u (t1=timecolumn(1), y0=getLastValue(1), t0) : (y0) w lp pt 7 \
        ti "value in the last column"
pause -1

# Version 3:
getPts(s) = (c=strstrt(s,"pts"), c>0 ? (r=s[1:c-1], p=word(r,words(r)), isNumber(p) ? \
            (t0=t1, real(p)) : y0) : y0)
plot t0=NaN FILE u (t1=timecolumn(1),y0=getPts(strcol(1)),t0):(y0) w lp pt 7 \
            ti "value anywhere with trailing 'pts'"
### end of script

Result:

Version 1:

Version 2:

Version 3:

回复收藏 0 原文

~没有更多了~