如何使用 python 使 HTML 对于 Web 浏览器来说是安全的?
如何使用 python 使电子邮件中的 HTML 安全地显示在 Web 浏览器中?
显示时不应遵循任何外部参考。换句话说,所有显示的内容都应该来自电子邮件,而不是来自互联网。
除垃圾邮件外,应尽可能按照作者的意图显示。
我想避免自己编码。
需要最新浏览器(firefox)版本的解决方案也是可以接受的。
How can I make HTML from email safe to display in web browser with python?
Any external references shouldn't be followed when displayed. In other words, all displayed content should come from the email and nothing from internet.
Other than spam emails should be displayed as closely as possible like intended by the writer.
I would like to avoid coding this myself.
Solutions requiring latest browser (firefox) version are also acceptable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
html5lib 包含 HTML+CSS 清理程序。目前它允许太多,但修改它以匹配用例应该不会太难。
从此处找到它。
html5lib contains an HTML+CSS sanitizer. It allows too much currently, but it shouldn't be too hard to modify it to match the use case.
Found it from here.
我不太清楚你所说的“安全”到底是什么意思。这是一个相当大的话题......但是,就其价值而言:
在我看来,剥离解析器< ActiveState Cookbook 中的 /a> 是最简单的解决方案之一。您几乎可以复制/粘贴该类并开始使用它。
也看看评论。最后一个声明它不再工作,但我也在某个应用程序中运行它并且工作正常。由于工作原因,我无法访问该盒子,因此我必须在周末查找它。
I'm not quite clear with what exactly you mean with "safe". It's a pretty big topic... but, for what it's worth:
In my opinion, the stripping parser from the ActiveState Cookbook is one of the easiest solutions. You can pretty much copy/paste the class and start using it.
Have a look at the comments as well. The last one states that it doesn't work anymore, but I also have this running in an application somewhere and it works fine. From work, I don't have access to that box, so I'll have to look it up over the weekend.
使用 HTMLparser 模块,或安装 BeautifulSoup,并使用它们来解析 HTML 并禁用或删除标签。这将保留那里的任何链接文本,但它不会突出显示,也不会可单击,因为您正在使用 Web 浏览器组件显示它。
您可以通过将
替换为
并更改文本装饰来更清楚地了解所做的操作显示链接曾经所在的位置。可能是与正常情况不同的蓝色阴影和虚线下划线以表示损坏。这样,您就更接近按预期显示它,而不会真正误导人们点击不可点击的内容。您甚至可以在 Javascript 或 纯 CSS 弹出一个工具提示,解释由于安全原因链接已被禁用。
可以使用
标签完成类似的操作,包括用空白矩形替换它们,以确保页面布局接近原始布局。
我已经用 Beautiful Soup 做过类似的事情,但 HTMLparser 包含在 Python 中。在较旧的 Python 发行版中,有一个 htmllib 现在已被弃用。由于电子邮件中的 HTML 可能不完全正确,因此请使用 Beautiful Soup 3.0.7a,它可以更好地理解损坏的 HTML。
Use the HTMLparser module, or install BeautifulSoup, and use those to parse the HTML and disable or remove the tags. This will leave whatever link text was there, but it will not be highlighted and it will not be clickable, since you are displaying it with a web browser component.
You could make it clearer what was done by replacing the
<A></A>
with a<SPAN></SPAN>
and changing the text decoration to show where the link used to be. Maybe a different shade of blue than normal and a dashed underscore to indicate brokenness. That way you are a little closer to displaying it as intended without actually misleading people into clicking on something that is not clickable. You could even add a hover in Javascript or pure CSS that pops up a tooltip explaining that links have been disabled for security reasons.Similar things could be done with
<IMG></IMG>
tags including replacing them with a blank rectangle to ensure that the page layout is close to the original.I've done stuff like this with Beautiful Soup, but HTMLparser is included with Python. In older Python distribs, there was an htmllib which is now deprecated. Since the HTML in an email message might not be fully correct, use Beautiful Soup 3.0.7a which is better at making sense of broken HTML.