清理(而不是删除)HTML 输入以破坏文档,但不清理其他标签
假设我们有一个生成 HTML 输入的用户表单,以下可能是 POST 到 PHP 的内容的示例。
<p>Hello</p>
<p><strong>World</strong></p>
现在,这些将在稍后通过注入 HTML 输出到某些 DIV 中显示。
我想阻止的是输入以下内容:
</div>
<p>Hello</p>
<p><strong>World</strong></p>
<div>
或者甚至类似:
</div>
<script> someScript(); </script>
<iframe src="http://www.example.com">......
<p>Hello</p>
<p><strong>World</strong></p>
<div>
如何使用 PHP 来确定此输入不会破坏文档、包含错误的 iframe 或运行脚本?最重要的部分是我仍然想要这些信息,我不会把它扔掉,但它需要作为某种无害的文本包含在内。
使用替代标记不是一种选择,它必须是 HTML。
Lets assume we have a user form that generates HTML input, and the following could be an example of what gets POSTed to PHP.
<p>Hello</p>
<p><strong>World</strong></p>
Now, these will show up later on via injected to the HTML output, into some DIV.
What I'd like to prevent is the following being entered in:
</div>
<p>Hello</p>
<p><strong>World</strong></p>
<div>
Or even something like:
</div>
<script> someScript(); </script>
<iframe src="http://www.example.com">......
<p>Hello</p>
<p><strong>World</strong></p>
<div>
How can I use PHP to determine that this input will not break the document, include bad iframes, or run scripts? The most importat part is I still want that information, I'm not throwing it out, but it needs to be included as harmless text of some sort.
Using alternative markup is not an option, it needs to be HTML.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你需要的是 htmlpurifier
不仅可以根据标准输出 html,还可以清除发布的代码中的 xss 漏洞。
编辑1:您还应该检查比较,它很有趣:)
编辑2:您还可以查看 htmlspecialchars 和 htmlentities
但是当涉及到更复杂的事情(比如你的事情)时,我认为 htmlpurifier 更好,并且更可定制。
what you need is htmlpurifier
Not only it outputs html according to standars but it cleans the posted code from xss vulnerabilities.
Edit 1: you should also check the comparison out , its interesting:)
Edit 2: you can also check out htmlspecialchars and htmlentities
but imo htmlpurifier is far better and much more customizable, when it comes to more complex things, like yours.
如果您想保留损坏的标签但使它们无害,我建议保存两次。将未修改的帖子数据保存到一个数据库列中,并将 Purified 保存到另一数据库列中。通常显示纯化版本,仅在需要时显示危险版本。
HTML Purifier 支持论坛上的某个位置有一个示例,说明如何将
text
更改为文本 (dangerous.url.or.javascript)
。当您说您想保留信息而不是扔掉它时,这可能就是您正在寻找的东西。HTML Purifier 是高度可定制的,作者 Ambush Commander 在 HTML Purifier 论坛和 StackOverflow 上都非常有帮助。
If you want to keep the broken tags but render them harmless, I'd suggest saving it twice. Save the unmodified post data into one database column, and the Purified into another. Display the Purified version usually, and the dangerous version only when you need to.
Somewhere on the HTML Purifier support forums there's an example of how to change
<a href="dangerous.url.or.javascript">text</a>
to<span>text (dangerous.url.or.javascript)</span>
. This may be the sort of thing you're looking for when you say you want to keep the information, not throw it out.HTML Purifier is highly customisable, and the author, Ambush Commander, is very helpful both on the HTML Purifier forum and here at StackOverflow.