有没有一个好的基于 Javascript 的 HTML 解析库可用?
我的目标是获取最终用户输入的 HTML,删除某些不安全的标签,例如 ,并将其添加到文档中。有人知道有一个好的 Javascript 库可以清理 html 吗?
我在网上搜索了一下,发现了一些,包括 John Resig 的 HTML 解析器, Erik Arvidsson 的简单 html 解析器,以及 Google 的 Caja Sanitizer,但我无法找到有关人们是否在使用这些库方面获得良好体验的太多信息,而且我担心它们不够强大,无法处理任意 HTML。我是否最好将 HTML 发送到我的 Java 服务器进行清理?
My goal is to take HTML entered by an end user, remove certain unsafe tags like <script>
, and add it to the document. Does anybody know of a good Javascript library to sanitize html?
I searched around and found a few online, including John Resig's HTML parser, Erik Arvidsson's simple html parser, and Google's Caja Sanitizer, but I haven't been able to find much information about whether people have had good experiences using these libraries, and I'm worried that they aren't really robust enough to handle arbitrary HTML. Would I be better off just sending the HTML to my Java server for sanitization?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 jQuery 解析 HTML,但我很确定任何基于黑名单(即过滤掉)的清理方法都是将会失败 - 您可能需要一种基于“过滤”的方法,并且最终您不想依赖 JavaScript 来保证安全。在任何情况下,作为参考,您都可以使用 jQuery 进行 DOM 解析,如下所示:
You can parse HTML with jQuery, but I'm pretty sure any blacklist based (i.e. filtering out) approach to sanitizing is going to fail - you probably need a "filtering in" based approach and ultimately you don't want to be relying on JavaScript for security anyway. In any case for reference you can use jQuery for DOM-parsing like this:
是的。
过滤“不安全”输入必须在服务器端完成。没有其他办法可以做到这一点。不可能在客户端进行过滤,因为“客户端”可能是网络浏览器,也可能是带有脚本的机器人。
Yes.
Filtering "unsafe" input must be done server-side. There is no other way to do it. It's not possible to do filtering client-side because the "client-side" could be a web browser or it could just as easily be a bot with a script.