Jsoup - 如何通过转义而不删除不需要的 html 来清理 html?
有没有办法让 jsoup 通过转义不需要的 HTML 而不是完全删除它来清理包含 HTML 的字符串?我的例子:
String dirty = "This is <b>REALLY</b> dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
String clean = Jsoup.clean(dirty, new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
这给出了一个“干净”的字符串:
This is REALLY dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
我想要的是“干净”的字符串:
"This is <b>REALLY</b> dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
Is there a way of getting jsoup to clean a string with HTML in it by escaping the unwanted HTML rather than removing it completely? My example:
String dirty = "This is <b>REALLY</b> dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
String clean = Jsoup.clean(dirty, new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
This gives a "clean" string of:
This is REALLY dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
What I am wanting is the "clean" string to be:
"This is <b>REALLY</b> dirty code from <a href="www.rubbish.url.zzzz">haxors-r-us</a>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设正在解析字符串而不是 HTML 文档(根据您的问题),此方法将起作用:
您可以将“b”标记作为参数来传递您希望转义的标记列表。
关联的通过 JUnit 测试:
请注意,我在测试的“预期”字符串中的“a”标记之前添加了一行 return“\n”,因为 JSoup 格式化了页面。
Assuming String rather than HTML documents are being parsed (as per your question) this method will work:
You could make the "b" tag an argument to pass in a list of tags you wish to escape.
The associated passing JUnit test:
Note that I added a line return "\n" before your "a" tag in my test's "expected" String because JSoup formats the page.