从文件中提取正则表达式捕获组的匹配项
我想在linux命令行下执行标题命名的操作(几个ca bash脚本也可以)。我尝试的命令是:
sed 's/href="([^"])"/$1/g' page.html > list.lst
但显然它失败了。
准确地说,这是我的输入:
<link rel="stylesheet" type="text/css" href="style/css/colors.css" />
<link rel="stylesheet" type="text/css" href="style/css/global.css" />
<link rel="stylesheet" type="text/css" href="style/css/icons.css" />
我想要的输出将是输入文件中所有匹配项的逗号分隔或空格分隔列表:
style/css/colors.css,style/css/global.css,style/css/icons.css
我认为我得到了正确的表达式: href="([^"]*) “
但我不知道如何执行此操作。sed会进行搜索/替换,这并不完全是我想要的。(相反,我只需要保留匹配项并扔掉其余的,而不是替换它们)
I want to perform the title-named action under linux command-line(several ca bash script will also do). the command I tried is:
sed 's/href="([^"])"/$1/g' page.html > list.lst
but obviously it failed.
To be precise, here is my input:
<link rel="stylesheet" type="text/css" href="style/css/colors.css" />
<link rel="stylesheet" type="text/css" href="style/css/global.css" />
<link rel="stylesheet" type="text/css" href="style/css/icons.css" />
the output I want would be a comma-separated or space-separated list of all matches in the input file:
style/css/colors.css,style/css/global.css,style/css/icons.css
I think I got the right expression: href="([^"]*)"
but I have no clue how to perform this. sed would do a search/replace which is not exactly what I want.( to the contrary, I only need to keep matches and throw the rest away, and not to replace them )
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这将提取其中包含
href
的所有行,并且仅获取每行上的第一个href
。另请参阅这篇文章关于使用正则表达式解析 HTML。This will extract all the lines that contain
href
in them and will only get the firsthref
on each line. Also, refer to this post about parsing HTML with regular expressions.