如何使用 Perl 删除 HTML 文件中 p 元素的所有属性?
我想使用这个简单的 Perl 命令行删除 HTML 文件中
的所有属性:
$ perl -pe 's/<p[^>]*>/<p>/' input.html
但是,它不会替代
跨越多行,例如
<p
class="hello">
删除行尾
# command-1
$ perl -pe 's/\n/ /' input.html > input-tmp.html
# command-2
$ perl -pe 's/<p[^>]*>/<p>/g' input-tmp.html > input-final.html
因此,我尝试首先通过执行Questions:
- Is There an option in (Perl) regex to try the match across multiplelines? 来
- ?我可以将上面的两个命令(command-1 和 command-2)合并为一个吗?基本上,第一个命令需要在第二个命令开始之前完成执行。
I'd like to remove all attributes of <p>
in an HTML file by using this simple Perl command line:
$ perl -pe 's/<p[^>]*>/<p>/' input.html
However, it won't substitute e.g. <p class="hello">
that spans multiple lines such as
<p
class="hello">
Thus, I attempted to first remove the end of line by doing
# command-1
$ perl -pe 's/\n/ /' input.html > input-tmp.html
# command-2
$ perl -pe 's/<p[^>]*>/<p>/g' input-tmp.html > input-final.html
Questions:
- Is there an option in (Perl) regex to try the match across multiple lines?
- Can I combine the two commands above (command-1 and command-2) into one? Basically, the first command needs to complete execution before the second one starts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
-p
是As you can see
$_
一次只包含一行的缩写,因此该模式不可能匹配跨越多行的内容。您可以使用-0777
欺骗 Perl 认为整个文件是一行。命令行选项记录在 perlrun 中。
-p
is short forAs you can see
$_
only contains one line at a times, so the pattern can't possibly match something that spans more than one line. You can fool Perl into thinking the whole file is one line using-0777
.Command line options are documented in perlrun.
如果您编写一个简短的脚本,并将其放入自己的文件中,则可以使用简单的命令行轻松调用它。
改进以下脚本留作练习:
If you write a short script, and put it in its own file, you can easily invoke it using a simple command line.
Improving the following script is left as an exercise:
perl -pe 'undef $/; s/
]*>/
/g'
perl -pe 'undef $/; s/<p[^>]*>/<p>/g'