用正则搜索，但仅替换字符串的一部分

发布于 2025-01-18 16:46:33 字数 819 浏览 5 评论 0原文

我正在尝试替换任何出现的 cwe.mitre.org.*.html （正则表达式）URL 并删除 .html 扩展名，而不更改任何其他类型的网址。

示例：

https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html

期望：

https://cwe.mitre.org/data/definitions/377
http://google.com/404.html

有没有办法在 sed 或其他工具中做到这一点？

我尝试过 sed -Ei 's/cwe.mitre.org.*.html//g' file.txt，但这不起作用。有没有办法让成为正则表达式？ sed 手册似乎没有提出这个建议？

编辑：我对 sed 手册的看法是错误的。它确实提到了这一点，请参阅 https: 的“5.7 反向引用和子表达式”部分： //www.gnu.org/software/sed/manual/sed.html。

原文

I'm trying to replace any occurrence of a cwe.mitre.org.*.html (regex) URL and remove the .html extension and not change any other type of URL.

Example:

https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html

Expectation:

https://cwe.mitre.org/data/definitions/377
http://google.com/404.html

Is there a way to do this in sed or another tool?

I've tried sed -Ei 's/cwe.mitre.org.*.html/<REPLACEMENT>/g' file.txt, but that won't work. Is there a way for the <REPLACEMENT> to be a regular expression? The sed manual doesn't seem to suggest that?

EDIT: I was wrong about the sed manual. It does mention it, see "5.7 Back-references and Subexpressions" section of https://www.gnu.org/software/sed/manual/sed.html.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

叹梦 2025-01-25 16:46:33

$ sed 's/\(cwe\.mitre\.org.*\)\.html/\1/' file
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html

谷歌 sed 捕获组。

$ sed 's/\(cwe\.mitre\.org.*\)\.html/\1/' file
https://cwe.mitre.org/data/definitions/377
http://google.com/404.html

google sed capture groups.

回复收藏 0 原文

猫烠⑼条掵仅有一顆心 2025-01-25 16:46:33

使用

sed -Ei 's/(cwe\.mitre\.org.*)\.html/\1/' file

说明

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    cwe                      'cwe'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    mitre                    'mitre'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    org                      'org'
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \.                       '.'
--------------------------------------------------------------------------------
  html                     'html'

\ 1 backEferences由模式的括号捕获的字符串的一部分。当您想要一场比赛留在结果中时，请使用反向注册。

Use

sed -Ei 's/(cwe\.mitre\.org.*)\.html/\1/' file

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    cwe                      'cwe'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    mitre                    'mitre'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    org                      'org'
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \.                       '.'
--------------------------------------------------------------------------------
  html                     'html'

The \1 backreferences the part of a string captured by parenthesized piece of the pattern. When you want a piece of a match stay in the result, use the backreference.

回复收藏 0 原文

冷…雨湿花 2025-01-25 16:46:33

gnu awk解决方案，让file.txt请提供

https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html

输出

awk '/cwe\.mitre\.org.*\.html/{sub(/\.html$/,"")}{print}' file.txt

说明

https://cwe.mitre.org/data/definitions/377
http://google.com/404.html

：如果您在排队中找到提供的正格性，请替换.html结束使用空字符串的行（$）的行。每行，是否更改，print。

（在GNU AWK 5.0.1中测试）

GNU AWK solution, let file.txt content be

https://cwe.mitre.org/data/definitions/377.html
http://google.com/404.html

then

awk '/cwe\.mitre\.org.*\.html/{sub(/\.html$/,"")}{print}' file.txt

gives output

https://cwe.mitre.org/data/definitions/377
http://google.com/404.html

Explanation: If you find provided regex in line, replace .html followed by end of line ($) using empty string. Every line, changed or not, print.

(tested in GNU Awk 5.0.1)

回复收藏 0 原文

就像说晚安 2025-01-25 16:46:33

另一种可能性是，

% sed '/cwe\.mitre\.org/s/\.html//' try.txt 
https://cwe.mitre.org/data/definitions/377
Nothing
hello.html
http://google.com/404.html

这并不比接受的答案更好（foo.html文本http://cwe.mitre.org/bar.html答案也可能假设一条线上只有一个相关的URL）。但是，我将其作为对此的补充，因为它有效地说明了SED命令可以由“地址”的“地址”前缀，该命令可以包括Regexps。此脚本在任何行上删除.html，其中包括cvw.mitre.org。

此功能通常被遗忘，并且只有偶尔有用，但是当合适的情况下，它可以避免s'模式'插槽和反向引用中的本来复杂的REGEXP。

Another possibility is

% sed '/cwe\.mitre\.org/s/\.html//' try.txt 
https://cwe.mitre.org/data/definitions/377
Nothing
hello.html
http://google.com/404.html

This isn't unequivocally better than the accepted answer (it would get confused by foo.html text http://cwe.mitre.org/bar.html, for example, but the other answers may also be assuming there's only one relevant URL on a line). I mention it as a supplement to that one, however, since it usefully illustrates that sed commands can be prefixed by ‘addresses’, which can include regexps. This script deletes .html on any line which includes cvw.mitre.org.

This feature is often forgotten, and is only occasionally useful, but when it's appropriate, it can avoid an otherwise complicated regexp in the s ‘pattern’ slot, and back-references.

回复收藏 0 原文

~没有更多了~