awk 并多次提取特定字段

发布于 2024-10-10 01:17:30 字数 882 浏览 9 评论 0原文

我有很多包含变量的文件,就像

{$var1} some text {$var2} some other text

我想将它们交给 awk 以便 awk 提取它们并给出如下结果:

file_name.htm - 8 : {$title}
file_name.htm - 10 : {$css_style}
file_name.htm - 33 : {$img_carte_image_02_over}

对于这个 awk 脚本来说这是小菜一碟:

#!/usr/bin/gawk -f
BEGIN { }
match($0, /({.*\$.+})/, tab) {
  for (x=1; tab[x]; x++) {
    print FILENAME" - "FNR" : "substr($0, tab[x, "start"], tab[x, "length"])
  }
}
END { }

我这样称呼它:

find website/ | grep -E '(html|htm)$' | xargs ./myh.sh | more

除非多个变量位于同一行,否则一切正常。在这种情况下,我得到:

file_name.htm - 59 : {$var1}<br/>{$var2}

而我想要:

file_name.htm - 59 : {$var1}
file_name.htm - 59 : {$var2}

知道我可以/应该做什么吗? 当然,如果您有其他解决方案(使用 sed 或其他),对我来说没问题!

多谢!

I've got many files with variables in them like

{$var1} some text {$var2} some other text

I'd like to give them to awk so that awk extracts them and gives a result like this:

file_name.htm - 8 : {$title}
file_name.htm - 10 : {$css_style}
file_name.htm - 33 : {$img_carte_image_02_over}

This is a piece of cake with this awk script:

#!/usr/bin/gawk -f
BEGIN { }
match($0, /({.*\$.+})/, tab) {
  for (x=1; tab[x]; x++) {
    print FILENAME" - "FNR" : "substr($0, tab[x, "start"], tab[x, "length"])
  }
}
END { }

I'm calling it like this:

find website/ | grep -E '(html|htm)

Everything works fine except when multiples variables are on the same line. In this case I get:

file_name.htm - 59 : {$var1}<br/>{$var2}

whereas I want:

file_name.htm - 59 : {$var1}
file_name.htm - 59 : {$var2}

Any idea how I could/should do?
Of course if you have another solution (with sed or whatever) it's ok for me!

Thanks a lot!

| xargs ./myh.sh | more

Everything works fine except when multiples variables are on the same line. In this case I get:

whereas I want:

Any idea how I could/should do?
Of course if you have another solution (with sed or whatever) it's ok for me!

Thanks a lot!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

物价感观 2024-10-17 01:17:30

试试这个:

awk '{
    line=$0; 
    while (match(line,/({[^$]*\$[^}]+})/)){
        print FILENAME,"-",FNR,":",substr(line,RSTART,RLENGTH);
        line=substr(line,RSTART+RLENGTH+1)
    }
}'

当 match() 返回 0 时,即 line 不包含任何其他“{foo$bar}”字符串时,循环结束;我使用 substr() 删除已扫描匹配项的行部分。

Try this one:

awk '{
    line=$0; 
    while (match(line,/({[^$]*\$[^}]+})/)){
        print FILENAME,"-",FNR,":",substr(line,RSTART,RLENGTH);
        line=substr(line,RSTART+RLENGTH+1)
    }
}'

The cycle ends when match() returns 0, that is when line doesn't contain any other "{foo$bar}" strings; I used substr() to remove the part of the line which has been already scanned for matches.

给不了的爱 2024-10-17 01:17:30

尝试在匹配中使用非贪婪正则表达式(http://www.exampledepot.com/egs/java.util.regex/Greedy.html)。可能行不通,但只是一个想法。

Try using a non-greedy regex in the match (http://www.exampledepot.com/egs/java.util.regex/Greedy.html). Probably won't work, but just an idea.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文