如何使用 sed 或 awk 从字符串中提取多个参数
我有一个如下所示的日志文件:
2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts
我想使用 gnuplot 绘制日期时间字符串与有趣的值。为了做到这一点,我试图将上面的日志文件解析为一个 csv 文件,该文件看起来像(并非日志中的所有行都有可绘制的值):
2010/01/12/ 12:00, 45
2010/01/ 13/ 14:00, 60
我如何使用 sed 或 awk 执行此操作?
我可以提取初始字符,例如:
cat partial.log | sed -e 's/^\(.\{17\}\).*/\1/'
但是如何提取最终值?
我一直在尝试这样做,但没有成功!
谢谢
I have a log file which looks like this:
2010/01/12/ 12:00 some un related alapha 129495 and the interesting value 45pts
2010/01/12/ 15:00 some un related alapha 129495 and no interesting value
2010/01/13/ 09:00 some un related alapha 345678 and the interesting value 60pts
I'd like to plot the date time string vs interesting value using gnuplot. In order to do that i'm trying to parse the above log file into a csv file which looks like (not all lines in the log have a plottable vale):
2010/01/12/ 12:00, 45
2010/01/13/ 14:00, 60
How can i do this with sed or awk?
I can extract the initial characters something like:
cat partial.log | sed -e 's/^\(.\{17\}\).*/\1/'
but how can i extract the end values?
I've been trying to do this to no avail!
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
重击
Bash
尝试:
输入:
输出:
根据您的新要求进行更新:
命令:
输出:
HTH Chris
try:
Input:
Output:
Updated for your new requirements:
Command:
Output:
HTH Chris
这确实是可能的。例如,像这样的正则表达式:
It is indeed possible. A regex such as this one, for instance:
输出:
[更新:根据您的新要求]
上面的cmd会给你:
output:
[Update:based on your new requirement]
the cmd above will give you:
sed
可以变得更具可读性:输出
sed
can be made more readable:output
我会在两个管道阶段中执行此操作,首先是 awk,然后是 sed:
通过使用 $NF 而不是固定数字,您可以使用最终字段,而不管不相关的文本是什么样子以及有多少个它所占据的领域。
I'd do that in two pipeline stages, first awk then sed:
By using
$NF
instead of a fixed number, you work with the final field, regardless of what the unrelated text looks like and how many fields it occupies.虽然这是一个非常古老的问题,有很多答案,但是您可以不使用
sed
或awk等外部工具来完成它
(因此与平台无关)。您可以“简单”地使用 gnuplot 来完成此操作(即使使用 OP 问题当时的版本:gnuplot 4.4.0,2010 年 3 月)。但是,从您的示例数据和描述来看,并不清楚感兴趣的值是否
pts
对于所有 3 种情况,仅使用 gnuplot (因此是独立于平台的)解决方案。
假设列分隔符是空格。
ad 1. 最简单的解决方案:使用
u 1:12
,gnuplot 将简单地忽略非数字和列值,例如45pts
将被解释为45< /代码>。
ad 2. 和 3. 如果您将最后一列提取为字符串,如果您想通过 real() 将非数字值转换为浮点数,gnuplot 将失败并停止。因此,您必须通过自己的函数
isNumber()
来测试列值是否至少以数字开头,从而可以通过real()
进行转换。如果字符串不是数字,您可以将值设置为1/0
或NaN
。然而,在早期的 gnuplot 版本中,线(点)图的线条将被中断。而在较新的 gnuplot 版本 (>=4.6.0) 中,您可以将值设置为 NaN 并通过
set datafile Missing NaN
避免中断,但这在以下版本中不可用gnuplot 4.4。此外,在 gnuplot 4.4 中,
NaN
只是设置为0.0
(GPVAL_NAN = 0.0
)。您可以使用下面也使用的“技巧”来解决此问题。
数据:
SO7353702.dat
脚本:(适用于 gnuplot>=4.4.0,2010 年 3 月)
结果:
版本 1:
版本 2:
版本3:
Although this is a really old question with many answers, but you can do it without the use of external tools like
sed
orawk
(hence platform-independent). You can "simply" do it with gnuplot (even with the version at that time of OP's question: gnuplot 4.4.0, March 2010).However, from your example data and description it is not clear whether the value of interest
pts
For all 3 cases there are gnuplot-only (hence platform-independent) solutions.
Assumption is that column separator is space.
ad 1. The simplest solution: with
u 1:12
, gnuplot will simply ignore non-numerical and column values, e.g. like45pts
will be interpreted as45
.ad 2. and 3. If you extract the last column as string, gnuplot will fail and stop if you want to convert a non-numerical value via
real()
into a floating point number. Hence, you have to test yourself via your own functionisNumber()
if the column value at least starts with a number and hence can be converted byreal()
. In case the string is not a number you could set the value to1/0
orNaN
. However, in earlier gnuplot versions the line of a lines(points) plot will be interrupted.Whereas in newer gnuplot versions (>=4.6.0) you could set the value to
NaN
and avoid interruptions viaset datafile missing NaN
which, however, is not available in gnuplot 4.4.Furthermore, in gnuplot 4.4
NaN
is simply set to0.0
(GPVAL_NAN = 0.0
).You can workaround this with this "trick" which is also used below.
Data:
SO7353702.dat
Script: (works for gnuplot>=4.4.0, March 2010)
Result:
Version 1:
Version 2:
Version 3: