如何使用正则表达式从某些文本中提取脚本标签?
我不太了解正则表达式,我正在尝试从一些提取的页面文本中获取所有脚本标签。我尝试过以下模式:
<script.*?>.*?</script>
但这似乎不会返回任何其中包含任何代码的脚本标记。即来自以下内容:
<script type="text/javascript" src="Scripts/Scipt1.js"></script>
<script type="text/javascript" src="Scripts/Scipt2.js"></script>
<script type="text/javascript">
function SomeMethod()
{
}
</script>
我只会得到以下结果:
<script type="text/javascript" src="Scripts/Scipt1.js"></script>
<script type="text/javascript" src="Scripts/Scipt2.js"></script>
如何返回所有 3 个? (注意。我确实想在结果中保留外部脚本标签)。
I don't know Regex very well, and I'm trying to get all of the script tags from some extracted page text. I've tried the following pattern:
<script.*?>.*?</script>
But this doesn't seem to return any script tag that has any code within it. I.e. it from the following:
<script type="text/javascript" src="Scripts/Scipt1.js"></script>
<script type="text/javascript" src="Scripts/Scipt2.js"></script>
<script type="text/javascript">
function SomeMethod()
{
}
</script>
I'll only get the following results:
<script type="text/javascript" src="Scripts/Scipt1.js"></script>
<script type="text/javascript" src="Scripts/Scipt2.js"></script>
How can I return all 3? (NB. I do want to maintain the outer script tags in the results).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
默认情况下,
.
不匹配换行符,因此您只会得到单行结果。使用 RegexOptions.Singleline 来修复此问题。它改变了
.
的含义以匹配任何字符,包括换行符,因此您也可以获得多行匹配。不要被名字搞糊涂了。另外,不要将其与 RegexOptions.Multiline 混淆,后者是完全不同的(请阅读 IntelliSense 工具提示来了解)。
The
.
does not, by default, match newlines, so you will only get single-line results.Use
RegexOptions.Singleline
to fix this. It changes the meaning of.
to match any character, including the newline, so you get multi-line matches too.Don’t get confused by the name. Also don’t confuse it with
RegexOptions.Multiline
, which is completely different (read the IntelliSense tooltips to find out).您应该使用 HTML Agility Pack。
例如:
You should use the HTML Agility Pack.
For example:
取决于 HTML 的质量。
编辑:Pre Xml.Linq 版本:
注意,两者都未经测试......
Depending on the quality of your HTML.
Edit: Pre Xml.Linq version:
Note, both are those are untested....