使用 Html Agility Pack 剥离 MS Word 标签
我有一个数据库,其中包含从 MS Word 粘贴的一些文本字段,并且我无法仅删除 , 和 标签,但显然保留了它们的内部文本。
我尝试过使用 HAP,但我没有朝着正确的方向前进。
Public Function StripHtml(ByVal html As String, ByVal allowHarmlessTags As Boolean) As String
Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml(html)
Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span")
For Each node In invalidNodes
node.ParentNode.RemoveChild(node, False)
Next
Return htmlDoc.DocumentNode.WriteTo()
End Function
此代码只是选择所需的元素并删除它们...但不保留其内部文本。
提前致谢
I have a DB with some text fields pasted from MS Word, and I'm having trouble to strip just the , and tags, but obviously keeping their innerText.
I've tried using the HAP but I'm not going in the right direction..
Public Function StripHtml(ByVal html As String, ByVal allowHarmlessTags As Boolean) As String
Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml(html)
Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span")
For Each node In invalidNodes
node.ParentNode.RemoveChild(node, False)
Next
Return htmlDoc.DocumentNode.WriteTo()
End Function
This code simply selects the desired elements and removes them... but not keeping their inner text..
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
嗯...我想我找到了一个解决方案:
我就快到了...:P
Well... I think I found a solution:
I was almost there... :P