从文件中提取正则表达式捕获组的匹配项

发布于 2024-11-26 02:57:04 字数 661 浏览 2 评论 0原文

我想在linux命令行下执行标题命名的操作（几个ca bash脚本也可以）。我尝试的命令是：

sed 's/href="([^"])"/$1/g' page.html > list.lst

但显然它失败了。

准确地说，这是我的输入：

<link rel="stylesheet" type="text/css" href="style/css/colors.css" />
<link rel="stylesheet" type="text/css" href="style/css/global.css" />
<link rel="stylesheet" type="text/css" href="style/css/icons.css" />

我想要的输出将是输入文件中所有匹配项的逗号分隔或空格分隔列表：

style/css/colors.css,style/css/global.css,style/css/icons.css

我认为我得到了正确的表达式： href="([^"]*) “

但我不知道如何执行此操作。sed会进行搜索/替换，这并不完全是我想要的。（相反，我只需要保留匹配项并扔掉其余的，而不是替换它们）

原文

I want to perform the title-named action under linux command-line(several ca bash script will also do). the command I tried is:

sed 's/href="([^"])"/$1/g' page.html > list.lst

but obviously it failed.

To be precise, here is my input:

<link rel="stylesheet" type="text/css" href="style/css/colors.css" />
<link rel="stylesheet" type="text/css" href="style/css/global.css" />
<link rel="stylesheet" type="text/css" href="style/css/icons.css" />

the output I want would be a comma-separated or space-separated list of all matches in the input file:

style/css/colors.css,style/css/global.css,style/css/icons.css

I think I got the right expression: href="([^"]*)"

but I have no clue how to perform this. sed would do a search/replace which is not exactly what I want.( to the contrary, I only need to keep matches and throw the rest away, and not to replace them )

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

因为看清所以看轻 2024-12-03 02:57:04

grep href page.html | sed 's/^.*href="\([^"]*\)".*$/\1/' | xargs | sed 's/ /,/g'

这将提取其中包含 href 的所有行，并且仅获取每行上的第一个 href 。另请参阅这篇文章关于使用正则表达式解析 HTML。

grep href page.html | sed 's/^.*href="\([^"]*\)".*$/\1/' | xargs | sed 's/ /,/g'

This will extract all the lines that contain href in them and will only get the first href on each line. Also, refer to this post about parsing HTML with regular expressions.

回复收藏 0 原文

~没有更多了~