用正则搜索,但仅替换字符串的一部分
我正在尝试替换任何出现的 cwe.mitre.org.*.html
(正则表达式)URL 并删除 .html
扩展名,而不更改任何其他类型的网址。
示例:
https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html
期望:
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html
有没有办法在 sed 或其他工具中做到这一点?
我尝试过 sed -Ei 's/cwe.mitre.org.*.html/
成为正则表达式? sed
手册似乎没有提出这个建议?
编辑:我对 sed 手册的看法是错误的。它确实提到了这一点,请参阅 https: 的“5.7 反向引用和子表达式”部分: //www.gnu.org/software/sed/manual/sed.html。
I'm trying to replace any occurrence of a cwe.mitre.org.*.html
(regex) URL and remove the .html
extension and not change any other type of URL.
Example:
https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html
Expectation:
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html
Is there a way to do this in sed or another tool?
I've tried sed -Ei 's/cwe.mitre.org.*.html/<REPLACEMENT>/g' file.txt
, but that won't work. Is there a way for the <REPLACEMENT>
to be a regular expression? The sed
manual doesn't seem to suggest that?
EDIT: I was wrong about the sed manual. It does mention it, see "5.7 Back-references and Subexpressions" section of https://www.gnu.org/software/sed/manual/sed.html.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
谷歌 sed 捕获组。
google sed capture groups.
使用
说明
\ 1
backEferences由模式的括号捕获的字符串的一部分。当您想要一场比赛留在结果中时,请使用反向注册。Use
EXPLANATION
The
\1
backreferences the part of a string captured by parenthesized piece of the pattern. When you want a piece of a match stay in the result, use the backreference.gnu
awk
解决方案,让file.txt
请提供输出
说明
:如果您在排队中找到提供的正格性,请替换
.html
结束使用空字符串的行($
)的行。每行,是否更改,print
。(在GNU AWK 5.0.1中测试)
GNU
AWK
solution, letfile.txt
content bethen
gives output
Explanation: If you find provided regex in line, replace
.html
followed by end of line ($
) using empty string. Every line, changed or not,print
.(tested in GNU Awk 5.0.1)
另一种可能性是,
这并不比接受的答案更好(
foo.html文本http://cwe.mitre.org/bar.html
答案也可能假设一条线上只有一个相关的URL)。但是,我将其作为对此的补充,因为它有效地说明了SED命令可以由“地址”的“地址”前缀,该命令可以包括Regexps。此脚本在任何行上删除.html
,其中包括cvw.mitre.org
。此功能通常被遗忘,并且只有偶尔有用,但是当合适的情况下,它可以避免
s
'模式'插槽和反向引用中的本来复杂的REGEXP。Another possibility is
This isn't unequivocally better than the accepted answer (it would get confused by
foo.html text http://cwe.mitre.org/bar.html
, for example, but the other answers may also be assuming there's only one relevant URL on a line). I mention it as a supplement to that one, however, since it usefully illustrates that sed commands can be prefixed by ‘addresses’, which can include regexps. This script deletes.html
on any line which includescvw.mitre.org
.This feature is often forgotten, and is only occasionally useful, but when it's appropriate, it can avoid an otherwise complicated regexp in the
s
‘pattern’ slot, and back-references.