HtmlAgilityPack 从 html 中清理内部文本
我有这个html。我试图获取其不带任何标签的 InnerText,
<h1>my h1 content</h1>
<div class="thisclass">
<p> some text</p>
<p> some text</p>
<div style="some_style">
some text
<script type="text/javascript">
<!-- some script -->
</script>
<script type='text/javascript' src='some_script.js'></script>
</div>
<p> some text<em>some text</em>some text.<em> <br /><br /></em><strong><em>some text</em></strong></p>
<p> </p>
</div>
我想做的是获取用户从此类中看到的文本。 我想删除任何脚本标签和所有标签,只获取纯文本。
这就是我正在使用的:
Dim Tags As HtmlNodeCollection = root.SelectNodes("//div[@class='thisclass'] | //h1")
有人有什么想法吗?
谢谢。
I have this html. I'm trying to get its InnerText without any tags in it,
<h1>my h1 content</h1>
<div class="thisclass">
<p> some text</p>
<p> some text</p>
<div style="some_style">
some text
<script type="text/javascript">
<!-- some script -->
</script>
<script type='text/javascript' src='some_script.js'></script>
</div>
<p> some text<em>some text</em>some text.<em> <br /><br /></em><strong><em>some text</em></strong></p>
<p> </p>
</div>
What am trying to do is get the text as the user would see it from the class thisclass.
I want to strip any script tag, and all tags, and just get plain text.
This is what am using:
Dim Tags As HtmlNodeCollection = root.SelectNodes("//div[@class='thisclass'] | //h1")
Does anyone have any ideas?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
试试这个(警告前面的 c# 代码):
这给了我以下输出:
希望这有帮助。
Try this (warning c# code ahead):
This gave me the following output:
Hope this helps.