php - 使用 preg_replace_callback 和 ord() 清理用户输入?

发布于 2024-12-11 12:46:43 字数 1128 浏览 3 评论 0原文

我有一个论坛样式的文本框,我想清理用户输入以阻止潜在的 xss 和代码插入。我见过使用 htmlentities,但后来其他人说 &,#,%,: 字符也需要编码,而且似乎我越看,弹出的潜在危险字符就越多。白名单是有问题的,因为除了 ^a-zA-z0-9 之外还有许多有效的文本选项。我想出了这段代码。它能阻止攻击并确保安全吗?有什么理由不使用它,或者有更好的方法吗?

function replaceHTML ($match) {
    return "&#" . ord ($match[0]) . ";";
}

$clean = preg_replace_callback ( "/[^ a-zA-Z0-9]/", "replaceHTML", $userInput );

编辑:_______________ ______________ 我当然可能是错的,但我的理解是 htmlentities 只替换 & < > “(并且'如果 ENT_QUOTES 打开)。这可能足以阻止大多数攻击(坦率地说,对于我的低流量网站来说可能绰绰有余)。然而,在我对细节的痴迷关注中,我进一步挖掘。我有一本书警告还对 # 和 % 进行编码以表示“关闭十六进制攻击”。我发现两个网站警告不允许使用 : 和 -- ,这让我很困惑,并引导我探索转换所有非字母数字字符。 htmlentities 已经做到了这一点,但似乎并不好。以下是我在 firefox 中单击“查看源代码”后复制的代码的结果

(要测试的随机字符): 5:gjla#''*&$!jl:4

preg_replace_callback: 5:gjla#''*&$!jl:4

htmlentities (w/ ENT_QUOTES): 5:gjla#''*&$!jl:4

htmlentities 似乎没有对其他字符进行编码,例如: 抱歉,文字墙。这只是我偏执吗?

编辑#2:___________

I have a forum style text box and I would like to sanitize the user input to stop potential xss and code insertion. I have seen htmlentities used, but then others have said that &,#,%,: characters need to be encoded as well, and it seems the more I look, the more potentially dangerous characters pop up. Whitelisting is problematic as there are many valid text options beyond ^a-zA-z0-9. I have come up with this code. Will it work to stop attacks and be secure? Is there any reason not to use it, or a better way?

function replaceHTML ($match) {
    return "&#" . ord ($match[0]) . ";";
}

$clean = preg_replace_callback ( "/[^ a-zA-Z0-9]/", "replaceHTML", $userInput );

EDIT:_____________________________
I could of course be wrong, but it is my understanding that htmlentities only replaces & < > " (and ' if ENT_QUOTES is turned on). This is probably enough to stop most attacks (and frankly probably more than enough for my low traffic site). In my obsessive attention to detail, however, I dug further. A book I have warns to also encode # and % for "shutting down hex attacks". Two websites I found warned against allowing : and --. Its all rather confusing to me, and led me to explore converting all non-alphanumeric characters. If htmlentities does this already then great, but it does not seem to. Here are results from code I ran I copied after clicking view source in firefox.

original (random characters to test):
5:gjla#''*&$!j-l:4

preg_replace_callback:
<b>5:</b>gjla<hi>#''*&$!j-l:4

htmlentities (w/ ENT_QUOTES):
<b>5:</b>gjla<hi>#''*&$!j-l:4

htmlentities appears to not be encoding those other characters like :
Sorry for the wall of text. Is this just me being paranoid?

EDIT #2: ___________

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

浅黛梨妆こ 2024-12-18 12:46:43

阻止 XSS 攻击所需要做的就是使用 htmlspecialchars()

All you need to do to stop XSS attacks is use htmlspecialchars().

深海不蓝 2024-12-18 12:46:43

这正是 htmlentities 已经做的事情:

http://codepad.viper-7.com/NDZMa3

它将转换(间隔以防止堆栈溢出双重编码):
“&#amp;”

“&#amp;#amp;”

That is exactly what htmlentities does already:

http://codepad.viper-7.com/NDZMa3

It will convert (spaced to prevent stackoverflow double encoding):
"& # amp ;"
to
"& # amp; # amp ;"

¢蛋碎的人ぎ生 2024-12-18 12:46:43

空格 ' ' 可以在您的正则表达式中更改为 \s,也可以通过在您创建的正则表达式的末尾添加 /i 来实现 不区分大小写,并且您不需要手动将字符转换为序列,可以通过 的回调来完成html实体

$clean = preg_replace_callback('/[^a-z0-9\s]/i', 'htmlentities', $userInput);

space ' ' can be changed to \s in your regex, also by adding /i at the end of the regex you made it case insensitive, and you don't need manually translate your chars to sequences, it can be done with a callback of htmlentities

$clean = preg_replace_callback('/[^a-z0-9\s]/i', 'htmlentities', $userInput);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文