PHP:比较百分比编码不同的 URI

发布于 2024-09-27 01:45:25 字数 661 浏览 5 评论 0原文

在 PHP 中,我想比较两个相对 URL 是否相等。问题:URL 的百分比编码可能不同,例如

  • /dir/file+file/dir/file%20file
  • /dir/file(file) vs. /dir/file%28file%29
  • /dir/file%5bfile vs. /dir/file%5Bfile

根据到 RFC 3986,服务器应该以相同的方式对待这些 URI。但如果我使用 == 进行比较,我最终会得到不匹配的结果。

所以我正在寻找一个 PHP 函数,它将接受两个字符串,如果它们表示相同的 URI,则返回 TRUE(不计算相同字符的编码/解码变体、大写/小写十六进制数字)以编码字符表示,+%20 表示空格),如果不同则返回 FALSE

我事先知道这些字符串中只有 ASCII 字符——没有 unicode。

In PHP, I want to compare two relative URLs for equality. The catch: URLs may differ in percent-encoding, e.g.

  • /dir/file+file vs. /dir/file%20file
  • /dir/file(file) vs. /dir/file%28file%29
  • /dir/file%5bfile vs. /dir/file%5Bfile

According to RFC 3986, servers should treat these URIs identically. But if I use == to compare, I'll end up with a mismatch.

So I'm looking for a PHP function which will accepts two strings and returns TRUE if they represent the same URI (dicounting encoded/decoded variants of the same char, upper-case/lower-case hex digits in encoded chars, and + vs. %20 for spaces), and FALSE if they're different.

I know in advance that only ASCII chars are in these strings-- no unicode.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

君勿笑 2024-10-04 01:45:25
function uriMatches($uri1, $uri2)
{
    return urldecode($uri1) == urldecode($uri2);
}

echo uriMatches('/dir/file+file', '/dir/file%20file');      // TRUE
echo uriMatches('/dir/file(file)', '/dir/file%28file%29');  // TRUE
echo uriMatches('/dir/file%5bfile', '/dir/file%5Bfile');    // TRUE

url 解码

function uriMatches($uri1, $uri2)
{
    return urldecode($uri1) == urldecode($uri2);
}

echo uriMatches('/dir/file+file', '/dir/file%20file');      // TRUE
echo uriMatches('/dir/file(file)', '/dir/file%28file%29');  // TRUE
echo uriMatches('/dir/file%5bfile', '/dir/file%5Bfile');    // TRUE

urldecode

就像说晚安 2024-10-04 01:45:25

编辑:请查看@webbiedave 的回复。他的要好得多(我什至不知道 PHP 中有一个函数可以做到这一点......每天学习一些新东西)

您将必须解析字符串以查找匹配的内容 %##查找这些百分比编码的出现情况。然后从中获取数字,您应该能够传递它,以便 chr() 函数获取这些百分比编码的字符。重建字符串,然后您应该能够匹配它们。

不确定这是最有效的方法,但考虑到 URL 通常不会那么长,因此应该不会对性能造成太大影响。

EDIT: Please look at @webbiedave's response. His is much better (I wasn't even aware that there was a function in PHP to do that.. learn something new everyday)

You will have to parse the strings to look for something matching %## to find the occurences of those percent encoding. Then taking the number from those, you should be able to pass it so the chr() function to get the character of those percent encodings. Rebuild the strings and then you should be able to match them.

Not sure that's the most efficient method, but considering URLs are not usually that long, it shouldn't be too much of a performance hit.

海未深 2024-10-04 01:45:25

我知道这里的这个问题似乎是由 webbiedave 解决的,但我也有自己的问题。

第一个问题:编码字符不区分大小写。因此 %C3 和 %c3 都是完全相同的字符,尽管它们作为 URI 不同。因此两个 URI 都指向同一位置。

第二个问题:folder%20(2) 和folder%20%282%29 都是有效的urlencoded URI,它们指向相同的位置,尽管它们是不同的URI。

第三个问题:如果我去掉 url 编码字符,我就会有两个具有相同 URI 的位置,如 bla%2Fblubb 和 bla/blubb。

那么该怎么办呢?为了比较两个 URI,我需要对它们进行规范化,将它们拆分为所有组件,对所有路径和查询部分进行一次 urldecode,对它们进行 rawurlencode 并将它们重新粘合在一起,然后我可以比较它们。

这可能是对其进行标准化的函数:

function normalizeURI($uri) {
    $components = parse_url($uri);
    $normalized = "";
    if ($components['scheme']) {
        $normalized .= $components['scheme'] . ":";
    }
    if ($components['host']) {
        $normalized .= "//";
        if ($components['user']) { //this should never happen in URIs, but still probably it's anything can happen thursday
            $normalized .= rawurlencode(urldecode($components['user']));
            if ($components['pass']) {
                $normalized .= ":".rawurlencode(urldecode($components['pass']));
            }
            $normalized .= "@";
        }
        $normalized .= $components['host'];
        if ($components['port']) {
            $normalized .= ":".$components['port'];
        }
    }
    if ($components['path']) {
        if ($normalized) {
            $normalized .= "/";
        }
        $path = explode("/", $components['path']);
        $path = array_map("urldecode", $path);
        $path = array_map("rawurlencode", $path);
        $normalized .= implode("/", $path);
    }
    if ($components['query']) {
        $query = explode("&", $components['query']);
        foreach ($query as $i => $c) {
            $c = explode("=", $c);
            $c = array_map("urldecode", $c);
            $c = array_map("rawurlencode", $c);
            $c = implode("=", $c);
            $query[$i] = $c;
        }
        $normalized .= "?".implode("&", $query);
    }
    return $normalized;
}

现在您可以将 webbiedave 的函数更改为:

function uriMatches($uri1, $uri2) {
    return normalizeURI($uri1) === normalizeURI($uri2);
}

应该可以。是的,它比我想要的要复杂得多。

I know this problem here seems to be solved by webbiedave, but I had my own problems with it.

First problem: Encoded characters are case-insensitive. So %C3 and %c3 are both the exact same character, although they are different as a URI. So both URIs point to the same location.

Second problem: folder%20(2) and folder%20%282%29 are both validly urlencoded URIs, which point to the same location, although they are different URIs.

Third problem: If I get rid of the url encoded characters I have two locations having the same URI like bla%2Fblubb and bla/blubb.

So what to do then? In order to compare two URIs, I need to normalize both of them in a way that I split them in all components, urldecode all paths and query-parts for once, rawurlencode them and glue them back together and then I could compare them.

And this could be the function to normalize it:

function normalizeURI($uri) {
    $components = parse_url($uri);
    $normalized = "";
    if ($components['scheme']) {
        $normalized .= $components['scheme'] . ":";
    }
    if ($components['host']) {
        $normalized .= "//";
        if ($components['user']) { //this should never happen in URIs, but still probably it's anything can happen thursday
            $normalized .= rawurlencode(urldecode($components['user']));
            if ($components['pass']) {
                $normalized .= ":".rawurlencode(urldecode($components['pass']));
            }
            $normalized .= "@";
        }
        $normalized .= $components['host'];
        if ($components['port']) {
            $normalized .= ":".$components['port'];
        }
    }
    if ($components['path']) {
        if ($normalized) {
            $normalized .= "/";
        }
        $path = explode("/", $components['path']);
        $path = array_map("urldecode", $path);
        $path = array_map("rawurlencode", $path);
        $normalized .= implode("/", $path);
    }
    if ($components['query']) {
        $query = explode("&", $components['query']);
        foreach ($query as $i => $c) {
            $c = explode("=", $c);
            $c = array_map("urldecode", $c);
            $c = array_map("rawurlencode", $c);
            $c = implode("=", $c);
            $query[$i] = $c;
        }
        $normalized .= "?".implode("&", $query);
    }
    return $normalized;
}

Now you can alter webbiedave's function to this:

function uriMatches($uri1, $uri2) {
    return normalizeURI($uri1) === normalizeURI($uri2);
}

That should do. And yes, it is quite more complicated than even I wanted it to be.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文