当前位置：文江博客话题详情

如何在 awk 或 sed 中编写查找所有函数（使用正则表达式）

发布于 2024-09-19 08:30:17 字数 345 浏览 6 评论 0 原文

我有运行 python 的 bash 函数（它从标准输入返回所有找到的正则表达式）

function find-all() {
    python -c "import re
import sys
print '\n'.join(re.findall('$1', sys.stdin.read()))"
}

当我使用这个正则表达式 find-all 'href="([^"]*)"' 它应该从正则表达式返回第一组（文件index.html中href属性的值）

我如何在sed或awk中编写它？

原文

I have bash function which run python (which return all finded regex from stdin)

function find-all() {
    python -c "import re
import sys
print '\n'.join(re.findall('$1', sys.stdin.read()))"
}

When I use this regex find-all 'href="([^"]*)"' < index.html it should return first group from the regex (value of href attribute from file index.html)

How can I write this in sed or awk?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

葬﹪忆之殇 2024-09-26 08:30:17

我建议您使用grep -o。

-o, --only-matching
       Show only the part of a matching line that matches PATTERN.

例如：

$ cat > foo
test test test
test
bar
baz test
$ grep -o test foo
test
test
test
test
test

更新

如果您要从 html 文件中提取 href 属性，请使用以下命令：

$ grep -o -E 'href="([^"]*)"' /usr/share/vlc/http/index.html
href="style.css"
href="iehacks.css"
href="old/"

您可以使用 cut 和 提取值>sed 像这样：

$ grep -o -E 'href="([^"]*)"' /usr/share/vlc/http/index.html| cut -f2 -d'=' | sed -e 's/"//g'
style.css
iehacks.css
old/

但是为了可靠性，你最好使用 html/xml 解析器。

I suggest you use grep -o.

-o, --only-matching
       Show only the part of a matching line that matches PATTERN.

E.g.:

$ cat > foo
test test test
test
bar
baz test
$ grep -o test foo
test
test
test
test
test

Update

If you were extracting href attributes from html files, using a command like:

$ grep -o -E 'href="([^"]*)"' /usr/share/vlc/http/index.html
href="style.css"
href="iehacks.css"
href="old/"

You could extract the values by using cut and sed like this:

$ grep -o -E 'href="([^"]*)"' /usr/share/vlc/http/index.html| cut -f2 -d'=' | sed -e 's/"//g'
style.css
iehacks.css
old/

But you'd be better off using html/xml parsers for reliability.

回复收藏 0 原文

热情消退 2024-09-26 08:30:17

这是一个 gawk 实现（未使用其他 awks 进行测试）： find_all.sh

awk -v "patt=$1" '
    function find_all(str, patt) {
        while (match(str, patt, a) > 0) {
            for (i=0; i in a; i++) print a[i]
            str = substr(str, RSTART+RLENGTH)
        }
    }
    $0 ~ patt {find_all($0, patt)}
' -

然后：

echo 'asdf href="href1" asdf asdf href="href2" asdfasdf
asdfasdfasdfasdf href="href3" asdfasdfasdf' | 
find_all.sh 'href="([^"]+)"'

输出：

href="href1"
href1
href="href2"
href2
href="href3"
href3

将 i=0 更改为 i=1 如果您只想打印捕获的组。使用i=0，即使模式中没有括号，您也会得到输出。

Here's a gawk implementation (not tested with other awks): find_all.sh

awk -v "patt=$1" '
    function find_all(str, patt) {
        while (match(str, patt, a) > 0) {
            for (i=0; i in a; i++) print a[i]
            str = substr(str, RSTART+RLENGTH)
        }
    }
    $0 ~ patt {find_all($0, patt)}
' -

Then:

echo 'asdf href="href1" asdf asdf href="href2" asdfasdf
asdfasdfasdfasdf href="href3" asdfasdfasdf' | 
find_all.sh 'href="([^"]+)"'

outputs:

href="href1"
href1
href="href2"
href2
href="href3"
href3

Change i=0 to i=1 if you only want to print the captured groups. With i=0 you'll get output even if you have no parentheses in your pattern.

回复收藏 0 原文

~没有更多了~

关于作者

晒暮凉

暂无简介

0 文章

0 评论

24 人气

关注发私信

謌踐踏愛綪

文章 0 评论 0

关注

开始看清了

文章 0 评论 0

关注

高速公鹿

文章 0 评论 0

关注

alipaysp_PLnULTzf66

文章 0 评论 0

关注

热情消退

文章 0 评论 0

关注

白色月光

文章 0 评论 0

友情链接

文江博客

如何在 awk 或 sed 中编写查找所有函数（使用正则表达式）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如何在 awk 或 sed 中编写查找所有函数（使用正则表达式）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。