注释代码的正则表达式出现问题

发布于 2024-09-06 12:07:14 字数 804 浏览 11 评论 0原文

我目前正在制作一个主页，登录用户可以在其中发表评论。注释字符串首先通过 str_replaces 表情符号的函数运行。之后我希望它

[url=www.whatever.com]linktext[/url]

与以下内容交换：

<a href='www.whatever.com'>linktext</a>

这样做的原因是我想删除不受我的评论代码控制的所有 html 代码的文本，以防某些用户决定发挥创意 -

并认为这会是最好使用 preg 替换，但我最终得到的代码（部分来自我值得信赖的“O reilly Sql 和 Php”书籍中有关 reg exp 的内容，部分来自网络）非常疯狂，最重要的是，不起作用。

任何帮助将不胜感激，谢谢。

可能可以交换整个代码，而不是像我所做的那样分成两段。只是决定先让两个较小的部分工作会更容易，然后再合并它们。

代码：

function text_format($string)
{
    $pattern="/([url=)+[a-zA-Z0-9]+(])+/";
    $string=preg_replace($pattern, "/(<a href=\')+[a-zA-Z0-9]+(\'>)+/", $string);
    $pattern="/([\/url])+/";
    $string=preg_replace($pattern, "/(<\/a>)+/", $string);    
    return $string;
}

原文

I am currently making a homepage where logged in users can write comments. The comment string is first run through a function that str_replaces emoticons. After that I want it to exchange

[url=www.whatever.com]linktext[/url]

with:

<a href='www.whatever.com'>linktext</a>

The reason for this is that I want to strip the text for all the html code that isn't controlled by my comment code, in case some users decide to get creative-

and thought it would be best to use preg replace but the code I ended up with (Partially from reading about reg exp from my trusty "O reilly Sql and Php"-book and partially from the web) Is pretty bonkers, and most importantly, doesn't work.

Any help would be appreciated, thanks.

It's probably possible to exchange the entire code, not in 2 segments like I have done. Just decided on that getting 2 smaller parts to work first would be easier, and then merge them afterwards.

code:

function text_format($string)
{
    $pattern="/([url=)+[a-zA-Z0-9]+(])+/";
    $string=preg_replace($pattern, "/(<a href=\')+[a-zA-Z0-9]+(\'>)+/", $string);
    $pattern="/([\/url])+/";
    $string=preg_replace($pattern, "/(<\/a>)+/", $string);    
    return $string;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

旧竹 2024-09-13 12:07:14

看起来您正在使用类似于 BBCode 的东西。为什么不使用 BBCode 解析器，比如这个？

http://nbbc.sourceforge.net/

它还可以处理表情符号，并将其替换为图像。如果您使用他们的测试页，您仍然会看到文本，因为他们不托管图像，并且将替代文本设置为笑脸。

回复收藏 0 原文

烟雨扶苏 2024-09-13 12:07:14

我尝试了以下方法：

function text_format($string)
{
    return preg_replace('#\[url=([^\]]+)\]([^\[]*)\[/url\]#', '<a href="$1">$2</a>', $string);
}

但是，一个直接的错误是，如果 linktext 为空，则和 < ;/a>。解决这个问题的一种方法是使用类似这样的方法进行另一遍：

preg_replace('#<a href="([^"]+)"></a>#', '<a href="$1">$1</a>', $string);

另一种选择是使用 preg_replace_callback 并将此逻辑放入回调函数中。

最后，这显然是一个常见的“问题”，并且已经被其他人解决了很多次，如果使用更成熟的开源解决方案是一种选择，我建议寻找一个。

I experimented a bit with the following:

function text_format($string)
{
    return preg_replace('#\[url=([^\]]+)\]([^\[]*)\[/url\]#', '<a href="$1">$2</a>', $string);
}

However, one immediate fault with this is that if linktext is empty, there will be nothing between <a> and </a>. One way around it would be to do another pass with something like this:

preg_replace('#<a href="([^"]+)"></a>#', '<a href="$1">$1</a>', $string);

Another option would be to use preg_replace_callback and put this logic inside your callback function.

Finally, this is obviously a common "problem" and has been solved many times by others, and if using a more mature open sourced solution is an option, I'd recommend looking for one.

回复收藏 0 原文

亽野灬性zι浪 2024-09-13 12:07:14

@Lauri Lehtinen 的答案对于学习该技术背后的想法很有帮助，但您不应该在实践中使用它，因为它会使您的网站极其容易受到 XSS 攻击。此外，链接垃圾邮件发送者会喜欢生成的链接上缺少 rel="nofollow"。

相反，使用类似：

<?php
// \author Daniel Trebbien
// \date 2010-06-22
// \par License
//  Public Domain

$allowed_uri_schemes = array('http', 'https', 'ftp', 'ftps', 'irc', 'mailto');

/**
 * Encodes a string in RFC 3986
 *
 * \see http://tools.ietf.org/html/rfc3986
 */
function encode_uri($str)
{
    $str = urlencode('' . $str);
    $search = array('%3A', '%2F', '%3F', '%23', '%5B', '%5D', '%40', '%21', '%24', '%26', '%27', '%28', '%29', '%2A', '%2B', '%2C', '%3B', '%3D', '%2E', '%7E');
    $replace = array(':', '/', '?', '#', '[', ']', '@', '!', '
我相信这对于 XSS 是安全的。此版本还有一个额外的好处，即可以写出指向包含 ']' 的 URL 的链接。
使用以下“测试套件”评估此代码：
echo render('[url=http://www.bing.com/][[/[/u[/ur[/urlBing[/url]') . "\n";
echo render('[url=][/url]') . "\n";
echo render('[url=http://www.bing.com/][[/url]') . "\n";
echo render('[url=http://www.bing.com/][/[/url]') . "\n";
echo render('[url=http://www.bing.com/][/u[/url]') . "\n";
echo render('[url=http://www.bing.com/][/ur[/url]') . "\n";
echo render('[url=http://www.bing.com/][/url[/url]') . "\n";
echo render('[url=http://www.bing.com/][/url][/url]') . "\n";
echo render('[url=    javascript: window.alert("hi")]click me[/url]') . "\n";
echo render('[url=#" onclick="window.alert(\'hi\')"]click me[/url]') . "\n";
echo render('[url=http://www.bing.com/]       [/url]') . "\n";
echo render('[url=/?#[\\]@!
amp;\'()*+,;=.~]       [/url]') . "\n"; // link text should be `/?#[]@!amp;amp;'()*+,;=.~`
echo render('[url=http://localhost/\\\\]d]abc[/url]') . "\n"; // href should be `http://localhost/%5C`, link text should be `d]abc`
echo render('[url=\\]][/url]') . "\n"; // link text should be `]`
echo render('[url=\\\\\\]][/url]') . "\n"; // link text should be `\]`
echo render('[url=\\\\\\\\\\]][/url]') . "\n"; // link text should be `\\]`
echo render('[url=a\\\\\\\\\\]bcde\\]fgh\\\\\\]ijklm][/url]') . "\n"; // link text should be `a\\]bcde]fgh\]ijklm`

或者，只需查看键盘结果。
如您所见，它有效。
, '&', '\'', '(', ')', '*', '+', ',', ';', '=', '.', '~'); // gen-delims / sub-delims / unreserved
    return str_ireplace($search, $replace, $str);
}

function url_preg_replace_callback($matches)
{
    global $allowed_uri_schemes;

    if (empty($matches[1]))
        return $matches[0];
    $href = trim($matches[1]);
    if (($i = strpos($href, ':')) !== FALSE) {
        if (strrpos($href, '/', $i) === FALSE) {
            if (!in_array(strtolower(substr($href, 0, $i)), $allowed_uri_schemes))
                return $matches[0];
        }
    }

    // unescape `\]`, `\\\]`, `\\\\\]`, etc.
    for ($j = strpos($href, '\\]'); $j !== FALSE; $j = strpos($href, '\\]', $j)) {
        for ($i = $j - 2; $i >= 0 && $href[$i] == '\\' && $href[$i + 1] == '\\'; $i -= 2)
            /* empty */;
        $i += 2;

        $h = '';
        if ($i > 0)
            $h = substr($href, 0, $i);
        for ($numBackslashes = floor(($j - $i)/2); $numBackslashes > 0; --$numBackslashes)
            $h .= '\\';
        $h .= ']';
        if (($j + 2) < strlen($href))
            $h .= substr($href, $j + 2);
        $href = $h;
        $j = $i + floor(($j - $i)/2) + 1;
    }

    if (!empty($matches[2]))
        $href .= str_replace('\\\\', '\\', $matches[2]);

    if (empty($matches[3]))
        $linkText = $href;
    else {
        $linkText = trim($matches[3]);
        if (empty($linkText))
            $linkText = $href;
    }
    $href = htmlspecialchars(encode_uri(htmlspecialchars_decode($href)));
    return "<a href=\"$href\" rel=\"nofollow\">$linkText</a>";
}

function render($input)
{
    $input = htmlspecialchars(strip_tags('' . $input));
    $input = preg_replace_callback('~\[url=((?:[^\]]|(?<!\\\\)(?:\\\\\\\\)*\\\\\])*)((?<!\\\\)(?:\\\\\\\\)*)\]' . '((?:[^[]|\[(?!/)|\[/(?!u)|\[/u(?!r)|\[/ur(?!l)|\[/url(?!\]))*)' . '\[/url\]~i', 'url_preg_replace_callback', $input);
    return $input;
}

我相信这对于 XSS 是安全的。此版本还有一个额外的好处，即可以写出指向包含 ']' 的 URL 的链接。

使用以下“测试套件”评估此代码：

或者，只需查看键盘结果。

如您所见，它有效。

@Lauri Lehtinen's answer is good for learning the idea behind the technique, but you shouldn't use it in practice because it would make your site extremely vulnerable to XSS attacks. Also, link spammers would appreciate the lack of rel="nofollow" on the generated links.

Instead, use something like:

<?php
// \author Daniel Trebbien
// \date 2010-06-22
// \par License
//  Public Domain

$allowed_uri_schemes = array('http', 'https', 'ftp', 'ftps', 'irc', 'mailto');

/**
 * Encodes a string in RFC 3986
 *
 * \see http://tools.ietf.org/html/rfc3986
 */
function encode_uri($str)
{
    $str = urlencode('' . $str);
    $search = array('%3A', '%2F', '%3F', '%23', '%5B', '%5D', '%40', '%21', '%24', '%26', '%27', '%28', '%29', '%2A', '%2B', '%2C', '%3B', '%3D', '%2E', '%7E');
    $replace = array(':', '/', '?', '#', '[', ']', '@', '!', '
which I believe is safe against XSS. This version has the added benefit that it is possible to write out links to URLs that contain ']'.
Evaluate this code with the following "test suite":
echo render('[url=http://www.bing.com/][[/[/u[/ur[/urlBing[/url]') . "\n";
echo render('[url=][/url]') . "\n";
echo render('[url=http://www.bing.com/][[/url]') . "\n";
echo render('[url=http://www.bing.com/][/[/url]') . "\n";
echo render('[url=http://www.bing.com/][/u[/url]') . "\n";
echo render('[url=http://www.bing.com/][/ur[/url]') . "\n";
echo render('[url=http://www.bing.com/][/url[/url]') . "\n";
echo render('[url=http://www.bing.com/][/url][/url]') . "\n";
echo render('[url=    javascript: window.alert("hi")]click me[/url]') . "\n";
echo render('[url=#" onclick="window.alert(\'hi\')"]click me[/url]') . "\n";
echo render('[url=http://www.bing.com/]       [/url]') . "\n";
echo render('[url=/?#[\\]@!
amp;\'()*+,;=.~]       [/url]') . "\n"; // link text should be `/?#[]@!amp;amp;'()*+,;=.~`
echo render('[url=http://localhost/\\\\]d]abc[/url]') . "\n"; // href should be `http://localhost/%5C`, link text should be `d]abc`
echo render('[url=\\]][/url]') . "\n"; // link text should be `]`
echo render('[url=\\\\\\]][/url]') . "\n"; // link text should be `\]`
echo render('[url=\\\\\\\\\\]][/url]') . "\n"; // link text should be `\\]`
echo render('[url=a\\\\\\\\\\]bcde\\]fgh\\\\\\]ijklm][/url]') . "\n"; // link text should be `a\\]bcde]fgh\]ijklm`

Or, just look at the Codepad results.
As you can see, it works.
, '&', '\'', '(', ')', '*', '+', ',', ';', '=', '.', '~'); // gen-delims / sub-delims / unreserved
    return str_ireplace($search, $replace, $str);
}

function url_preg_replace_callback($matches)
{
    global $allowed_uri_schemes;

    if (empty($matches[1]))
        return $matches[0];
    $href = trim($matches[1]);
    if (($i = strpos($href, ':')) !== FALSE) {
        if (strrpos($href, '/', $i) === FALSE) {
            if (!in_array(strtolower(substr($href, 0, $i)), $allowed_uri_schemes))
                return $matches[0];
        }
    }

    // unescape `\]`, `\\\]`, `\\\\\]`, etc.
    for ($j = strpos($href, '\\]'); $j !== FALSE; $j = strpos($href, '\\]', $j)) {
        for ($i = $j - 2; $i >= 0 && $href[$i] == '\\' && $href[$i + 1] == '\\'; $i -= 2)
            /* empty */;
        $i += 2;

        $h = '';
        if ($i > 0)
            $h = substr($href, 0, $i);
        for ($numBackslashes = floor(($j - $i)/2); $numBackslashes > 0; --$numBackslashes)
            $h .= '\\';
        $h .= ']';
        if (($j + 2) < strlen($href))
            $h .= substr($href, $j + 2);
        $href = $h;
        $j = $i + floor(($j - $i)/2) + 1;
    }

    if (!empty($matches[2]))
        $href .= str_replace('\\\\', '\\', $matches[2]);

    if (empty($matches[3]))
        $linkText = $href;
    else {
        $linkText = trim($matches[3]);
        if (empty($linkText))
            $linkText = $href;
    }
    $href = htmlspecialchars(encode_uri(htmlspecialchars_decode($href)));
    return "<a href=\"$href\" rel=\"nofollow\">$linkText</a>";
}

function render($input)
{
    $input = htmlspecialchars(strip_tags('' . $input));
    $input = preg_replace_callback('~\[url=((?:[^\]]|(?<!\\\\)(?:\\\\\\\\)*\\\\\])*)((?<!\\\\)(?:\\\\\\\\)*)\]' . '((?:[^[]|\[(?!/)|\[/(?!u)|\[/u(?!r)|\[/ur(?!l)|\[/url(?!\]))*)' . '\[/url\]~i', 'url_preg_replace_callback', $input);
    return $input;
}

which I believe is safe against XSS. This version has the added benefit that it is possible to write out links to URLs that contain ']'.

Evaluate this code with the following "test suite":

Or, just look at the Codepad results.

As you can see, it works.

回复收藏 0 原文

~没有更多了~