将多个正则表达式合并为一个

发布于 2024-12-14 18:38:24 字数 1919 浏览 2 评论 0原文

我正在过滤所有用户输入以删除以下字符： http://www.w3.org/TR/unicode-xml/#Charlist（“不适合与标记一起使用的字符”）。所以，我有这两个函数：

if (!function_exists("mb_trim")) {
    function mb_trim($str)
    {
        return preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
    }
}

function sanitize($str)
{
    // Clones of grave and accent
    $str = preg_replace("/[\x{0340}-\x{0341}]+/u", "", $str);

    // Obsolete characters for Khmer
    $str = preg_replace("/[\x{17A3}]+/u", "", $str);

    $str = preg_replace("/[\x{17D3}]+/u", "", $str);

    // Line and paragraph separator
    $str = preg_replace("/[\x{2028}]+/u", "", $str);

    $str = preg_replace("/[\x{2029}]+/u", "", $str);

    // BIDI embedding controls (LRE, RLE, LRO, RLO, PDF)
    $str = preg_replace("/[\x{202A}-\x{202E}]+/u", "", $str);

    // Activate/Inhibit Symmetric swapping
    $str = preg_replace("/[\x{206A}-\x{206B}]+/u", "", $str);

    // Activate/Inhibit Arabic from shaping
    $str = preg_replace("/[\x{206C}-\x{206D}]+/u", "", $str);

    // Activate/Inhibit National digit shapes
    $str = preg_replace("/[\x{206E}-\x{206F}]+/u", "", $str);

    // Interlinear annotation characters
    $str = preg_replace("/[\x{FFF9}-\x{FFFB}]+/u", "", $str);

    // Byte Order Mark
    $str = preg_replace("/[\x{FEFF}]+/u", "", $str);

    // Object replacement character
    $str = preg_replace("/[\x{FFFC}]+/u", "", $str);

    // Scoping for Musical Notation
    $str = preg_replace("/[\x{1D173}-\x{1D17A}]+/u", "", $str);

    $str = mb_trim($str);

    if (mb_check_encoding($str)) {
        return $str;
    } else {
        return false;
    }
}

我对正则表达式了解不多，所以，我想知道的是

mb_trim 函数对于修剪多字节字符串是否正确？
是否可以将函数中的所有正则表达式连接起来清理只进行一次 preg_replace？

谢谢

原文

I'm filtering all user input to remove the following characters:
http://www.w3.org/TR/unicode-xml/#Charlist ("not suitable characters for use with markup").
So, I have this two functions:

if (!function_exists("mb_trim")) {
    function mb_trim($str)
    {
        return preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
    }
}

function sanitize($str)
{
    // Clones of grave and accent
    $str = preg_replace("/[\x{0340}-\x{0341}]+/u", "", $str);

    // Obsolete characters for Khmer
    $str = preg_replace("/[\x{17A3}]+/u", "", $str);

    $str = preg_replace("/[\x{17D3}]+/u", "", $str);

    // Line and paragraph separator
    $str = preg_replace("/[\x{2028}]+/u", "", $str);

    $str = preg_replace("/[\x{2029}]+/u", "", $str);

    // BIDI embedding controls (LRE, RLE, LRO, RLO, PDF)
    $str = preg_replace("/[\x{202A}-\x{202E}]+/u", "", $str);

    // Activate/Inhibit Symmetric swapping
    $str = preg_replace("/[\x{206A}-\x{206B}]+/u", "", $str);

    // Activate/Inhibit Arabic from shaping
    $str = preg_replace("/[\x{206C}-\x{206D}]+/u", "", $str);

    // Activate/Inhibit National digit shapes
    $str = preg_replace("/[\x{206E}-\x{206F}]+/u", "", $str);

    // Interlinear annotation characters
    $str = preg_replace("/[\x{FFF9}-\x{FFFB}]+/u", "", $str);

    // Byte Order Mark
    $str = preg_replace("/[\x{FEFF}]+/u", "", $str);

    // Object replacement character
    $str = preg_replace("/[\x{FFFC}]+/u", "", $str);

    // Scoping for Musical Notation
    $str = preg_replace("/[\x{1D173}-\x{1D17A}]+/u", "", $str);

    $str = mb_trim($str);

    if (mb_check_encoding($str)) {
        return $str;
    } else {
        return false;
    }
}

I have not much knowledge with regular expresions, so, what I want to know is

Is the mb_trim function correct for trimming multi-byte strings?
Is it possible to join all regular expresions in the function
sanitize to do only one preg_replace?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倾城月光淡如水﹏ 2024-12-21 18:38:26

您可以通过将它们组合成一个字符集来使用一个 preg_replace ，如下所示：

 $str = preg_replace("/[\x{0340}-\x{0341}\x{17A3}\x{17D3}\x{2028}-\x{2029}\x{202A}-\x{202E}\x{206A}-\x{206B}\x{206C}-\x{206D}\x{206E}-\x{206F}\x{FFF9}-\x{FFFB}\x{FEFF}\x{FFFC}\x{1D173}-\x{1D17A}]+/u", "", $str);

You can do with one preg_replace by combining them into a one character set like so:

 $str = preg_replace("/[\x{0340}-\x{0341}\x{17A3}\x{17D3}\x{2028}-\x{2029}\x{202A}-\x{202E}\x{206A}-\x{206B}\x{206C}-\x{206D}\x{206E}-\x{206F}\x{FFF9}-\x{FFFB}\x{FEFF}\x{FFFC}\x{1D173}-\x{1D17A}]+/u", "", $str);

回复收藏 0 原文

~没有更多了~