使用 unix 命令从包含模式的 html 文件中删除字符串

发布于 2024-09-11 09:33:19 字数 625 浏览 4 评论 0原文

我有一个凌乱的 html，如下所示：

<div id=":0.page.0" class="page-element" style="width: 1620px;">
 <div>
  <img src="viewer_files/viewer_004.png" class="page-image" style="width: 800px; height: 1131px; display: none;">
  <img src="viewer_files/viewer_005.png" class="page-image" style="width: 1600px;">
 </div>
</div>// this repeats 100+ times with different 'src' attributes

现在这实际上都是一行（为了便于阅读，我已将其格式化为多行）。我正在尝试删除在内联 css 中设置了 display:none; 的所有标签。是否可以使用 sed/awk 或其他一些 unix 命令来实现此目的？我认为如果它是一个缩进良好的 html 文档，那就很容易了。

原文

I have a messy html that looks like this:

<div id=":0.page.0" class="page-element" style="width: 1620px;">
 <div>
  <img src="viewer_files/viewer_004.png" class="page-image" style="width: 800px; height: 1131px; display: none;">
  <img src="viewer_files/viewer_005.png" class="page-image" style="width: 1600px;">
 </div>
</div>// this repeats 100+ times with different 'src' attributes

Now this is all one line actually (i have formatted in multiple lines for easy readibility). I am trying to remove all <img> tags that have display:none; set in the inline css. Is it possible to use sed/awk or some other unix command to achieve this? I think if it were a well indented html document, it would've been easy.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

优雅的叶子 2024-09-18 09:33:55

sed 有几个命令，但大多数人只学习替代命令：“s”。
一个有用的命令会删除与限制匹配的每一行：“d”。

sed -e "/<img[^>]*display: none;[^>]*>/d" File

小心它删除整行。

Sed has several commands, but most people only learn the substitute command: "s".
A useful command deletes every line that matches the restriction: "d".

sed -e "/<img[^>]*display: none;[^>]*>/d" File

Be carreful it's delete entire line.

回复收藏 0 原文

简单爱 2024-09-18 09:33:53

这样就可以了

sed -e "s@<img.*display: none;.*>@@g" FILINAME

That would do it

sed -e "s@<img.*display: none;.*>@@g" FILINAME

回复收藏 0 原文

忆沫 2024-09-18 09:33:49

sed -e "s/<img[^>]*display: none;[^>]*>//g" filein

关于 sed 的快速解释：

s 代表替换
/ 是分隔符

s 表示第一个字段将是要搜索的模式，将被第二个字段替换。最后一项是选项。
g 表示全局（如果找到很多匹配项，请多次替换）。

就地替换： sed -i -e "..."

sed -e "s/<img[^>]*display: none;[^>]*>//g" filein

A quick explanation about sed :

s stands for substitution
/ are delimiters

s means that the first field will be a pattern to be search, that will be replaced by the second one. The last one are options.
g means global (replace it many times if many matches are found).

to replace inplace : sed -i -e "..."

回复收藏 0 原文

爱你是孤单的心事 2024-09-18 09:33:46

sed 's/<img.*display: none;[^>]>//g' file

sed 's/<img.*display: none;[^>]>//g' file

回复收藏 0 原文

你的笑 2024-09-18 09:33:41

我会使用 Twig 或 XMLStarlet 来做这种处理。比 sed/awk/grep 可靠得多。由于您的模式是有规律且重复的，因此它们也会起作用。

回复收藏 0 原文

安静被遗忘 2024-09-18 09:33:36

HTML 和正则表达式是出了名的不匹配，因此您可能需要 HTML 感知的东西。我可能会选择类似 TagSoup 的东西，但毫无疑问还有其他对 shell 更友好的选项，或者适合您可能拥有的任何最喜欢的脚本语言的选项。

回复收藏 0 原文

~没有更多了~

关于作者

且行且努力

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

使用 unix 命令从包含模式的 html 文件中删除字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

使用 unix 命令从包含模式的 html 文件中删除字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。