Stack Overflow 如何生成 SEO 友好的 URL?

发布于 2024-07-04 12:06:38 字数 729 浏览 11 评论 0 原文

什么是一个好的完整的 正则表达式 或其他一些可以采用标题的过程:

如何将标题更改为 URL 的一部分,例如 Stack Overflow?

并将其转换为

how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow

StackOverflow 上 SEO 友好的 URL 中使用的内容?

我使用的开发环境是 Ruby on Rails,但是如果还有其他一些特定于平台的解决方案(.NET、PHP、Django),我也很想看到这些。

我确信我(或其他读者)会在不同的平台上遇到同样的问题。

我正在使用自定义路由,我主要想知道如何将字符串更改为删除所有特殊字符,全部小写,并替换所有空格。

What is a good complete regular expression or some other process that would take the title:

How do you change a title to be part of the URL like Stack Overflow?

and turn it into

how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow

that is used in the SEO-friendly URLs on Stack Overflow?

The development environment I am using is Ruby on Rails, but if there are some other platform-specific solutions (.NET, PHP, Django), I would love to see those too.

I am sure I (or another reader) will come across the same problem on a different platform down the line.

I am using custom routes, and I mainly want to know how to alter the string to all special characters are removed, it's all lowercase, and all whitespace is replaced.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(21

在巴黎塔顶看东京樱花 2024-07-11 12:06:39

有一个名为 PermalinkFu 的小型 Rub​​y on Rails 插件,可以执行此操作。 转义方法将转换为字符串适用于 URL。 看一下代码; 这个方法很简单。

要删除非 ASCII 字符,它使用 iconv 库将其转换为 'ascii//ignore/ /translit' 来自 'utf-8'。 然后空格变成破折号,所有内容都小写,等等。

There is a small Ruby on Rails plugin called PermalinkFu, that does this. The escape method does the transformation into a string that is suitable for a URL. Have a look at the code; that method is quite simple.

To remove non-ASCII characters it uses the iconv lib to translate to 'ascii//ignore//translit' from 'utf-8'. Spaces are then turned into dashes, everything is downcased, etc.

记忆で 2024-07-11 12:06:39

Brian 的 Ruby 代码:

title.downcase.strip.gsub(/\ /, '-').gsub(/[^\w\-]/, '')

downcase 将字符串转换为小写,strip 删除前导和尾随空格,第一个 gsub 调用 g< /em>lobally 用破折号替换空格,第二个删除所有不是字母或破折号的内容。

Brian's code, in Ruby:

title.downcase.strip.gsub(/\ /, '-').gsub(/[^\w\-]/, '')

downcase turns the string to lowercase, strip removes leading and trailing whitespace, the first gsub call globally substitutes spaces with dashes, and the second removes everything that isn't a letter or a dash.

雪落纷纷 2024-07-11 12:06:39

不不不。 你们都错了。 除了变音符号之外,你已经到达了那里,但是亚洲字符呢(Ruby 开发者没有考虑他们的 nihonjin 弟兄们)。

Firefox 和 Safari 都在 URL 中显示非 ASCII 字符,坦率地说,它们看起来很棒。 很高兴支持像“http://somewhere.com/news/read/お前たちはaホじゃないかい'。

这里有一些 PHP 代码可以做到这一点,但我只是编写了它,还没有对其进行压力测试。

<?php
    function slug($str)
    {
        $args = func_get_args();
        array_filter($args);  //remove blanks
        $slug = mb_strtolower(implode('-', $args));

        $real_slug = '';
        $hyphen = '';
        foreach(SU::mb_str_split($slug) as $c)
        {
            if (strlen($c) > 1 && mb_strlen($c)===1)
            {
                $real_slug .= $hyphen . $c;
                $hyphen = '';
            }
            else
            {
                switch($c)
                {
                    case '&':
                        $hyphen = $real_slug ? '-and-' : '';
                        break;
                    case 'a':
                    case 'b':
                    case 'c':
                    case 'd':
                    case 'e':
                    case 'f':
                    case 'g':
                    case 'h':
                    case 'i':
                    case 'j':
                    case 'k':
                    case 'l':
                    case 'm':
                    case 'n':
                    case 'o':
                    case 'p':
                    case 'q':
                    case 'r':
                    case 's':
                    case 't':
                    case 'u':
                    case 'v':
                    case 'w':
                    case 'x':
                    case 'y':
                    case 'z':

                    case 'A':
                    case 'B':
                    case 'C':
                    case 'D':
                    case 'E':
                    case 'F':
                    case 'G':
                    case 'H':
                    case 'I':
                    case 'J':
                    case 'K':
                    case 'L':
                    case 'M':
                    case 'N':
                    case 'O':
                    case 'P':
                    case 'Q':
                    case 'R':
                    case 'S':
                    case 'T':
                    case 'U':
                    case 'V':
                    case 'W':
                    case 'X':
                    case 'Y':
                    case 'Z':

                    case '0':
                    case '1':
                    case '2':
                    case '3':
                    case '4':
                    case '5':
                    case '6':
                    case '7':
                    case '8':
                    case '9':
                        $real_slug .= $hyphen . $c;
                        $hyphen = '';
                        break;

                    default:
                       $hyphen = $hyphen ? $hyphen : ($real_slug ? '-' : '');
                }
            }
        }
        return $real_slug;
    }

示例:

$str = "~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 コリン ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 トーマス ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 アーノルド ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04";
echo slug($str);

输出:
コrin-and-トーマsu-and-アーノルド

“-and-”是因为 & 变成了“-and-”。

No, no, no. You are all so very wrong. Except for the diacritics-fu stuff, you're getting there, but what about Asian characters (shame on Ruby developers for not considering their nihonjin brethren).

Firefox and Safari both display non-ASCII characters in the URL, and frankly they look great. It is nice to support links like 'http://somewhere.com/news/read/お前たちはアホじゃないかい'.

So here's some PHP code that'll do it, but I just wrote it and haven't stress tested it.

<?php
    function slug($str)
    {
        $args = func_get_args();
        array_filter($args);  //remove blanks
        $slug = mb_strtolower(implode('-', $args));

        $real_slug = '';
        $hyphen = '';
        foreach(SU::mb_str_split($slug) as $c)
        {
            if (strlen($c) > 1 && mb_strlen($c)===1)
            {
                $real_slug .= $hyphen . $c;
                $hyphen = '';
            }
            else
            {
                switch($c)
                {
                    case '&':
                        $hyphen = $real_slug ? '-and-' : '';
                        break;
                    case 'a':
                    case 'b':
                    case 'c':
                    case 'd':
                    case 'e':
                    case 'f':
                    case 'g':
                    case 'h':
                    case 'i':
                    case 'j':
                    case 'k':
                    case 'l':
                    case 'm':
                    case 'n':
                    case 'o':
                    case 'p':
                    case 'q':
                    case 'r':
                    case 's':
                    case 't':
                    case 'u':
                    case 'v':
                    case 'w':
                    case 'x':
                    case 'y':
                    case 'z':

                    case 'A':
                    case 'B':
                    case 'C':
                    case 'D':
                    case 'E':
                    case 'F':
                    case 'G':
                    case 'H':
                    case 'I':
                    case 'J':
                    case 'K':
                    case 'L':
                    case 'M':
                    case 'N':
                    case 'O':
                    case 'P':
                    case 'Q':
                    case 'R':
                    case 'S':
                    case 'T':
                    case 'U':
                    case 'V':
                    case 'W':
                    case 'X':
                    case 'Y':
                    case 'Z':

                    case '0':
                    case '1':
                    case '2':
                    case '3':
                    case '4':
                    case '5':
                    case '6':
                    case '7':
                    case '8':
                    case '9':
                        $real_slug .= $hyphen . $c;
                        $hyphen = '';
                        break;

                    default:
                       $hyphen = $hyphen ? $hyphen : ($real_slug ? '-' : '');
                }
            }
        }
        return $real_slug;
    }

Example:

$str = "~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 コリン ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 トーマス ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04 アーノルド ~!@#$%^&*()_+-=[]\{}|;':\",./<>?\n\r\t\x07\x00\x04";
echo slug($str);

Outputs:
コリン-and-トーマス-and-アーノルド

The '-and-' is because &'s get changed to '-and-'.

神魇的王 2024-07-11 12:06:39

我将代码移植到 TypeScript 中。 它可以很容易地适应 JavaScript。

我正在向 String 原型添加一个 .contains 方法,如果您的目标是最新的浏览器或 ES6,则可以使用 .includes 代替。

if (!String.prototype.contains) {
    String.prototype.contains = function (check) {
        return this.indexOf(check, 0) !== -1;
    };
}

declare interface String {
    contains(check: string): boolean;
}

export function MakeUrlFriendly(title: string) {
            if (title == null || title == '')
                return '';

            const maxlen = 80;
            let len = title.length;
            let prevdash = false;
            let result = '';
            let c: string;
            let cc: number;
            let remapInternationalCharToAscii = function (c: string) {
                let s = c.toLowerCase();
                if ("àåáâäãåą".contains(s)) {
                    return "a";
                }
                else if ("èéêëę".contains(s)) {
                    return "e";
                }
                else if ("ìíîïı".contains(s)) {
                    return "i";
                }
                else if ("òóôõöøőð".contains(s)) {
                    return "o";
                }
                else if ("ùúûüŭů".contains(s)) {
                    return "u";
                }
                else if ("çćčĉ".contains(s)) {
                    return "c";
                }
                else if ("żźž".contains(s)) {
                    return "z";
                }
                else if ("śşšŝ".contains(s)) {
                    return "s";
                }
                else if ("ñń".contains(s)) {
                    return "n";
                }
                else if ("ýÿ".contains(s)) {
                    return "y";
                }
                else if ("ğĝ".contains(s)) {
                    return "g";
                }
                else if (c == 'ř') {
                    return "r";
                }
                else if (c == 'ł') {
                    return "l";
                }
                else if (c == 'đ') {
                    return "d";
                }
                else if (c == 'ß') {
                    return "ss";
                }
                else if (c == 'Þ') {
                    return "th";
                }
                else if (c == 'ĥ') {
                    return "h";
                }
                else if (c == 'ĵ') {
                    return "j";
                }
                else {
                    return "";
                }
            };

            for (let i = 0; i < len; i++) {
                c = title[i];
                cc = c.charCodeAt(0);

                if ((cc >= 97 /* a */ && cc <= 122 /* z */) || (cc >= 48 /* 0 */ && cc <= 57 /* 9 */)) {
                    result += c;
                    prevdash = false;
                }
                else if ((cc >= 65 && cc <= 90 /* A - Z */)) {
                    result += c.toLowerCase();
                    prevdash = false;
                }
                else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\\' || c == '-' || c == '_' || c == '=') {
                    if (!prevdash && result.length > 0) {
                        result += '-';
                        prevdash = true;
                    }
                }
                else if (cc >= 128) {
                    let prevlen = result.length;
                    result += remapInternationalCharToAscii(c);
                    if (prevlen != result.length) prevdash = false;
                }
                if (i == maxlen) break;
            }

            if (prevdash)
                return result.substring(0, result.length - 1);
            else
                return result;
        }

I ported the code to TypeScript. It can easily be adapted to JavaScript.

I am adding a .contains method to the String prototype, if you're targeting the latest browsers or ES6 you can use .includes instead.

if (!String.prototype.contains) {
    String.prototype.contains = function (check) {
        return this.indexOf(check, 0) !== -1;
    };
}

declare interface String {
    contains(check: string): boolean;
}

export function MakeUrlFriendly(title: string) {
            if (title == null || title == '')
                return '';

            const maxlen = 80;
            let len = title.length;
            let prevdash = false;
            let result = '';
            let c: string;
            let cc: number;
            let remapInternationalCharToAscii = function (c: string) {
                let s = c.toLowerCase();
                if ("àåáâäãåą".contains(s)) {
                    return "a";
                }
                else if ("èéêëę".contains(s)) {
                    return "e";
                }
                else if ("ìíîïı".contains(s)) {
                    return "i";
                }
                else if ("òóôõöøőð".contains(s)) {
                    return "o";
                }
                else if ("ùúûüŭů".contains(s)) {
                    return "u";
                }
                else if ("çćčĉ".contains(s)) {
                    return "c";
                }
                else if ("żźž".contains(s)) {
                    return "z";
                }
                else if ("śşšŝ".contains(s)) {
                    return "s";
                }
                else if ("ñń".contains(s)) {
                    return "n";
                }
                else if ("ýÿ".contains(s)) {
                    return "y";
                }
                else if ("ğĝ".contains(s)) {
                    return "g";
                }
                else if (c == 'ř') {
                    return "r";
                }
                else if (c == 'ł') {
                    return "l";
                }
                else if (c == 'đ') {
                    return "d";
                }
                else if (c == 'ß') {
                    return "ss";
                }
                else if (c == 'Þ') {
                    return "th";
                }
                else if (c == 'ĥ') {
                    return "h";
                }
                else if (c == 'ĵ') {
                    return "j";
                }
                else {
                    return "";
                }
            };

            for (let i = 0; i < len; i++) {
                c = title[i];
                cc = c.charCodeAt(0);

                if ((cc >= 97 /* a */ && cc <= 122 /* z */) || (cc >= 48 /* 0 */ && cc <= 57 /* 9 */)) {
                    result += c;
                    prevdash = false;
                }
                else if ((cc >= 65 && cc <= 90 /* A - Z */)) {
                    result += c.toLowerCase();
                    prevdash = false;
                }
                else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\\' || c == '-' || c == '_' || c == '=') {
                    if (!prevdash && result.length > 0) {
                        result += '-';
                        prevdash = true;
                    }
                }
                else if (cc >= 128) {
                    let prevlen = result.length;
                    result += remapInternationalCharToAscii(c);
                    if (prevlen != result.length) prevdash = false;
                }
                if (i == maxlen) break;
            }

            if (prevdash)
                return result.substring(0, result.length - 1);
            else
                return result;
        }
〃安静 2024-07-11 12:06:39

重写 Jeff 的代码使其更加简洁

    public static string RemapInternationalCharToAscii(char c)
    {
        var s = c.ToString().ToLowerInvariant();

        var mappings = new Dictionary<string, string>
        {
            { "a", "àåáâäãåą" },
            { "c", "çćčĉ" },
            { "d", "đ" },
            { "e", "èéêëę" },
            { "g", "ğĝ" },
            { "h", "ĥ" },
            { "i", "ìíîïı" },
            { "j", "ĵ" },
            { "l", "ł" },
            { "n", "ñń" },
            { "o", "òóôõöøőð" },
            { "r", "ř" },
            { "s", "śşšŝ" },
            { "ss", "ß" },
            { "th", "Þ" },
            { "u", "ùúûüŭů" },
            { "y", "ýÿ" },
            { "z", "żźž" }
        };

        foreach(var mapping in mappings)
        {
            if (mapping.Value.Contains(s))
                return mapping.Key;
        }

        return string.Empty;
    }

Rewrite of Jeff's code to be more concise

    public static string RemapInternationalCharToAscii(char c)
    {
        var s = c.ToString().ToLowerInvariant();

        var mappings = new Dictionary<string, string>
        {
            { "a", "àåáâäãåą" },
            { "c", "çćčĉ" },
            { "d", "đ" },
            { "e", "èéêëę" },
            { "g", "ğĝ" },
            { "h", "ĥ" },
            { "i", "ìíîïı" },
            { "j", "ĵ" },
            { "l", "ł" },
            { "n", "ñń" },
            { "o", "òóôõöøőð" },
            { "r", "ř" },
            { "s", "śşšŝ" },
            { "ss", "ß" },
            { "th", "Þ" },
            { "u", "ùúûüŭů" },
            { "y", "ýÿ" },
            { "z", "żźž" }
        };

        foreach(var mapping in mappings)
        {
            if (mapping.Value.Contains(s))
                return mapping.Key;
        }

        return string.Empty;
    }
月寒剑心 2024-07-11 12:06:39

我喜欢这种不使用 正则表达式 的方式,因此我将其移植到 PHP 中。 我刚刚添加了一个名为 is_ Between 的函数来检查字符:

function is_between($val, $min, $max)
{
    $val = (int) $val; $min = (int) $min; $max = (int) $max;

    return ($val >= $min && $val <= $max);
}

function international_char_to_ascii($char)
{
    if (mb_strpos('àåáâäãåa', $char) !== false)
    {
        return 'a';
    }

    if (mb_strpos('èéêëe', $char) !== false)
    {
        return 'e';
    }

    if (mb_strpos('ìíîïi', $char) !== false)
    {
        return 'i';
    }

    if (mb_strpos('òóôõö', $char) !== false)
    {
        return 'o';
    }

    if (mb_strpos('ùúûüuu', $char) !== false)
    {
        return 'u';
    }

    if (mb_strpos('çccc', $char) !== false)
    {
        return 'c';
    }

    if (mb_strpos('zzž', $char) !== false)
    {
        return 'z';
    }

    if (mb_strpos('ssšs', $char) !== false)
    {
        return 's';
    }

    if (mb_strpos('ñn', $char) !== false)
    {
        return 'n';
    }

    if (mb_strpos('ýÿ', $char) !== false)
    {
        return 'y';
    }

    if (mb_strpos('gg', $char) !== false)
    {
        return 'g';
    }

    if (mb_strpos('r', $char) !== false)
    {
        return 'r';
    }

    if (mb_strpos('l', $char) !== false)
    {
        return 'l';
    }

    if (mb_strpos('d', $char) !== false)
    {
        return 'd';
    }

    if (mb_strpos('ß', $char) !== false)
    {
        return 'ss';
    }

    if (mb_strpos('Þ', $char) !== false)
    {
        return 'th';
    }

    if (mb_strpos('h', $char) !== false)
    {
        return 'h';
    }

    if (mb_strpos('j', $char) !== false)
    {
        return 'j';
    }
    return '';
}

function url_friendly_title($url_title)
{
    if (empty($url_title))
    {
        return '';
    }

    $url_title = mb_strtolower($url_title);

    $url_title_max_length   = 80;
    $url_title_length       = mb_strlen($url_title);
    $url_title_friendly     = '';
    $url_title_dash_added   = false;
    $url_title_char = '';

    for ($i = 0; $i < $url_title_length; $i++)
    {
        $url_title_char     = mb_substr($url_title, $i, 1);

        if (strlen($url_title_char) == 2)
        {
            $url_title_ascii    = ord($url_title_char[0]) * 256 + ord($url_title_char[1]) . "\r\n";
        }
        else
        {
            $url_title_ascii    = ord($url_title_char);
        }

        if (is_between($url_title_ascii, 97, 122) || is_between($url_title_ascii, 48, 57))
        {
            $url_title_friendly .= $url_title_char;

            $url_title_dash_added = false;
        }
        elseif(is_between($url_title_ascii, 65, 90))
        {
            $url_title_friendly .= chr(($url_title_ascii | 32));

            $url_title_dash_added = false;
        }
        elseif($url_title_ascii == 32 || $url_title_ascii == 44 || $url_title_ascii == 46 || $url_title_ascii == 47 || $url_title_ascii == 92 || $url_title_ascii == 45 || $url_title_ascii == 47 || $url_title_ascii == 95 || $url_title_ascii == 61)
        {
            if (!$url_title_dash_added && mb_strlen($url_title_friendly) > 0)
            {
                $url_title_friendly .= chr(45);

                $url_title_dash_added = true;
            }
        }
        else if ($url_title_ascii >= 128)
        {
            $url_title_previous_length = mb_strlen($url_title_friendly);

            $url_title_friendly .= international_char_to_ascii($url_title_char);

            if ($url_title_previous_length != mb_strlen($url_title_friendly))
            {
                $url_title_dash_added = false;
            }
        }

        if ($i == $url_title_max_length)
        {
            break;
        }
    }

    if ($url_title_dash_added)
    {
        return mb_substr($url_title_friendly, 0, -1);
    }
    else
    {
        return $url_title_friendly;
    }
}

I liked the way this is done without using regular expressions, so I ported it to PHP. I just added a function called is_between to check characters:

function is_between($val, $min, $max)
{
    $val = (int) $val; $min = (int) $min; $max = (int) $max;

    return ($val >= $min && $val <= $max);
}

function international_char_to_ascii($char)
{
    if (mb_strpos('àåáâäãåa', $char) !== false)
    {
        return 'a';
    }

    if (mb_strpos('èéêëe', $char) !== false)
    {
        return 'e';
    }

    if (mb_strpos('ìíîïi', $char) !== false)
    {
        return 'i';
    }

    if (mb_strpos('òóôõö', $char) !== false)
    {
        return 'o';
    }

    if (mb_strpos('ùúûüuu', $char) !== false)
    {
        return 'u';
    }

    if (mb_strpos('çccc', $char) !== false)
    {
        return 'c';
    }

    if (mb_strpos('zzž', $char) !== false)
    {
        return 'z';
    }

    if (mb_strpos('ssšs', $char) !== false)
    {
        return 's';
    }

    if (mb_strpos('ñn', $char) !== false)
    {
        return 'n';
    }

    if (mb_strpos('ýÿ', $char) !== false)
    {
        return 'y';
    }

    if (mb_strpos('gg', $char) !== false)
    {
        return 'g';
    }

    if (mb_strpos('r', $char) !== false)
    {
        return 'r';
    }

    if (mb_strpos('l', $char) !== false)
    {
        return 'l';
    }

    if (mb_strpos('d', $char) !== false)
    {
        return 'd';
    }

    if (mb_strpos('ß', $char) !== false)
    {
        return 'ss';
    }

    if (mb_strpos('Þ', $char) !== false)
    {
        return 'th';
    }

    if (mb_strpos('h', $char) !== false)
    {
        return 'h';
    }

    if (mb_strpos('j', $char) !== false)
    {
        return 'j';
    }
    return '';
}

function url_friendly_title($url_title)
{
    if (empty($url_title))
    {
        return '';
    }

    $url_title = mb_strtolower($url_title);

    $url_title_max_length   = 80;
    $url_title_length       = mb_strlen($url_title);
    $url_title_friendly     = '';
    $url_title_dash_added   = false;
    $url_title_char = '';

    for ($i = 0; $i < $url_title_length; $i++)
    {
        $url_title_char     = mb_substr($url_title, $i, 1);

        if (strlen($url_title_char) == 2)
        {
            $url_title_ascii    = ord($url_title_char[0]) * 256 + ord($url_title_char[1]) . "\r\n";
        }
        else
        {
            $url_title_ascii    = ord($url_title_char);
        }

        if (is_between($url_title_ascii, 97, 122) || is_between($url_title_ascii, 48, 57))
        {
            $url_title_friendly .= $url_title_char;

            $url_title_dash_added = false;
        }
        elseif(is_between($url_title_ascii, 65, 90))
        {
            $url_title_friendly .= chr(($url_title_ascii | 32));

            $url_title_dash_added = false;
        }
        elseif($url_title_ascii == 32 || $url_title_ascii == 44 || $url_title_ascii == 46 || $url_title_ascii == 47 || $url_title_ascii == 92 || $url_title_ascii == 45 || $url_title_ascii == 47 || $url_title_ascii == 95 || $url_title_ascii == 61)
        {
            if (!$url_title_dash_added && mb_strlen($url_title_friendly) > 0)
            {
                $url_title_friendly .= chr(45);

                $url_title_dash_added = true;
            }
        }
        else if ($url_title_ascii >= 128)
        {
            $url_title_previous_length = mb_strlen($url_title_friendly);

            $url_title_friendly .= international_char_to_ascii($url_title_char);

            if ($url_title_previous_length != mb_strlen($url_title_friendly))
            {
                $url_title_dash_added = false;
            }
        }

        if ($i == $url_title_max_length)
        {
            break;
        }
    }

    if ($url_title_dash_added)
    {
        return mb_substr($url_title_friendly, 0, -1);
    }
    else
    {
        return $url_title_friendly;
    }
}
猫弦 2024-07-11 12:06:39

stackoverflow 解决方案非常棒,但现代浏览器(不包括 IE,像往常一样)现在可以很好地处理 utf8 编码:

在此处输入图像描述

所以我升级了建议的解决方案:

public static string ToFriendlyUrl(string title, bool useUTF8Encoding = false)
{
    ...

        else if (c >= 128)
        {
            int prevlen = sb.Length;
            if (useUTF8Encoding )
            {
                sb.Append(HttpUtility.UrlEncode(c.ToString(CultureInfo.InvariantCulture),Encoding.UTF8));
            }
            else
            {
                sb.Append(RemapInternationalCharToAscii(c));
            }
    ...
}

Pastebin 上的完整代码

编辑:这是 RemapInternationalCharToAscii 方法的代码(pastebin 中缺少该方法)。

The stackoverflow solution is great, but modern browser (excluding IE, as usual) now handle nicely utf8 encoding:

enter image description here

So I upgraded the proposed solution:

public static string ToFriendlyUrl(string title, bool useUTF8Encoding = false)
{
    ...

        else if (c >= 128)
        {
            int prevlen = sb.Length;
            if (useUTF8Encoding )
            {
                sb.Append(HttpUtility.UrlEncode(c.ToString(CultureInfo.InvariantCulture),Encoding.UTF8));
            }
            else
            {
                sb.Append(RemapInternationalCharToAscii(c));
            }
    ...
}

Full Code on Pastebin

Edit: Here's the code for RemapInternationalCharToAscii method (that's missing in the pastebin).

大姐,你呐 2024-07-11 12:06:39

这是我的 Jeff 代码版本(速度较慢,但​​写起来很有趣):

public static string URLFriendly(string title)
{
    char? prevRead = null,
        prevWritten = null;

    var seq = 
        from c in title
        let norm = RemapInternationalCharToAscii(char.ToLowerInvariant(c).ToString())[0]
        let keep = char.IsLetterOrDigit(norm)
        where prevRead.HasValue || keep
        let replaced = keep ? norm
            :  prevWritten != '-' ? '-'
            :  (char?)null
        where replaced != null
        let s = replaced + (prevRead == null ? ""
            : norm == '#' && "cf".Contains(prevRead.Value) ? "sharp"
            : norm == '+' ? "plus"
            : "")
        let _ = prevRead = norm
        from written in s
        let __ = prevWritten = written
        select written;

    const int maxlen = 80;  
    return string.Concat(seq.Take(maxlen)).TrimEnd('-');
}

public static string RemapInternationalCharToAscii(string text)
{
    var seq = text.Normalize(NormalizationForm.FormD)
        .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);

    return string.Concat(seq).Normalize(NormalizationForm.FormC);
}

我的测试字符串:

" 我喜欢 C#、F#、C++ 和...焦糖布丁!!!他们看到我编码...他们讨厌'...试图抓住我肮脏的编码...“

Here's my (slower, but fun to write) version of Jeff's code:

public static string URLFriendly(string title)
{
    char? prevRead = null,
        prevWritten = null;

    var seq = 
        from c in title
        let norm = RemapInternationalCharToAscii(char.ToLowerInvariant(c).ToString())[0]
        let keep = char.IsLetterOrDigit(norm)
        where prevRead.HasValue || keep
        let replaced = keep ? norm
            :  prevWritten != '-' ? '-'
            :  (char?)null
        where replaced != null
        let s = replaced + (prevRead == null ? ""
            : norm == '#' && "cf".Contains(prevRead.Value) ? "sharp"
            : norm == '+' ? "plus"
            : "")
        let _ = prevRead = norm
        from written in s
        let __ = prevWritten = written
        select written;

    const int maxlen = 80;  
    return string.Concat(seq.Take(maxlen)).TrimEnd('-');
}

public static string RemapInternationalCharToAscii(string text)
{
    var seq = text.Normalize(NormalizationForm.FormD)
        .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);

    return string.Concat(seq).Normalize(NormalizationForm.FormC);
}

My test string:

" I love C#, F#, C++, and... Crème brûlée!!! They see me codin'... they hatin'... tryin' to catch me codin' dirty... "

一身软味 2024-07-11 12:06:39

您可以使用以下辅助方法。 它可以转换 Unicode 字符。

public static string ConvertTextToSlug(string s)
{
    StringBuilder sb = new StringBuilder();

    bool wasHyphen = true;

    foreach (char c in s)
    {
        if (char.IsLetterOrDigit(c))
        {
            sb.Append(char.ToLower(c));
            wasHyphen = false;
        }
        else
            if (char.IsWhiteSpace(c) && !wasHyphen)
            {
                sb.Append('-');
                wasHyphen = true;
            }
    }

    // Avoid trailing hyphens
    if (wasHyphen && sb.Length > 0)
        sb.Length--;

    return sb.ToString().Replace("--","-");
}

You can use the following helper method. It can convert the Unicode characters.

public static string ConvertTextToSlug(string s)
{
    StringBuilder sb = new StringBuilder();

    bool wasHyphen = true;

    foreach (char c in s)
    {
        if (char.IsLetterOrDigit(c))
        {
            sb.Append(char.ToLower(c));
            wasHyphen = false;
        }
        else
            if (char.IsWhiteSpace(c) && !wasHyphen)
            {
                sb.Append('-');
                wasHyphen = true;
            }
    }

    // Avoid trailing hyphens
    if (wasHyphen && sb.Length > 0)
        sb.Length--;

    return sb.ToString().Replace("--","-");
}
聚集的泪 2024-07-11 12:06:39

现在所有浏览器都能很好地处理 utf8 编码,因此您可以使用 WebUtility.UrlEncode 方法,类似于 HttpUtility.UrlEncode 由 @giamin 使用,但它在 Web 应用程序之外工作。

Now all Browser handle nicely utf8 encoding, so you can use WebUtility.UrlEncode Method , its like HttpUtility.UrlEncode used by @giamin but its work outside of a web application.

戏舞 2024-07-11 12:06:38

这是我的杰夫代码版本。 我进行了以下更改:

  • 连字符以可以添加的方式附加,然后需要删除,因为它是字符串中的最后一个字符。 也就是说,我们永远不需要“my-slug-”。 这意味着需要额外的字符串分配来在这种边缘情况下删除它。 我已经通过延迟连字符解决了这个问题。 如果您将我的代码与 Jeff 的代码进行比较,则逻辑很容易理解。
  • 他的方法纯粹是基于查找的,并且错过了我在研究 Stack Overflow 时在示例中发现的许多字符。 为了解决这个问题,我首先执行标准化过程(又名 Meta Stack Overflow 问题中提到的排序规则 从完整(配置文件)URL 中删除的非 US-ASCII 字符),然后忽略可接受范围之外的任何字符。 这在大多数情况下都有效……
  • 如果不起作用,我还必须添加一个查找表。 如上所述,某些字符在标准化时不会映射到低 ASCII 值。 我没有放弃这些,而是​​得到了一份手动的例外列表,这无疑充满了漏洞,但总比没有好。 规范化代码的灵感来自 Jon Hanna 在 Stack Overflow 问题中的精彩帖子如何删除字符串上的重音符号?
  • 大小写转换现在也是可选的。

    公共静态类Slug 
      { 
          公共静态字符串创建(布尔toLower,参数字符串[]值) 
          { 
              返回 Create(toLower, String.Join("-", 值)); 
          } 
    
          /// <摘要> 
          /// 创建一个 slug。 
          /// 参考: 
          /// http://www.unicode.org/reports/tr15/tr15-34.html 
          /// https://meta.stackexchange.com/questions/7435/non-us-ascii-characters-dropped-from-full-profile-url/7696#7696 
          /// https://stackoverflow.com/questions/25259/how-do-you-include-a-webpage-title-as-part-of-a-webpage-url/25486#25486 
          /// https://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string 
          ///  
          /// ; 
          /// ; 
          /// ; 
          公共静态字符串创建(布尔toLower,字符串值) 
          { 
              如果(值==空) 
                  返回 ””; 
    
              var 标准化 = value.Normalize(NormalizationForm.FormKD); 
    
              常量 int 最大长度 = 80; 
              int len = 标准化. 长度; 
              布尔 prevDash = false; 
              var sb = new StringBuilder(len); 
              字符c; 
    
              for (int i = 0; i < len; i++) 
              { 
                  c = 归一化[i]; 
                  if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9')) 
                  { 
                      if (上一个短划线) 
                      { 
                          sb.Append('-'); 
                          prevDash = false; 
                      } 
                      sb.Append(c); 
                  } 
                  否则 if (c >= 'A' && c <= 'Z') 
                  { 
                      if (上一个短划线) 
                      { 
                          sb.Append('-'); 
                          prevDash = false; 
                      } 
                      // 转换为小写的棘手方法 
                      if (至下) 
                          sb.Append((char)(c | 32)); 
                      别的 
                          sb.Append(c); 
                  } 
                  else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\\' || c == '-' | | c == '_' || c == '=') 
                  { 
                      if (!prevDash && sb.Length > 0) 
                      { 
                          prevDash = true; 
                      } 
                  } 
                  别的 
                  { 
                      字符串交换 = ConvertEdgeCases(c, toLower); 
    
                      if(交换!= null) 
                      { 
                          if (上一个短划线) 
                          { 
                              sb.Append('-'); 
                              prevDash = false; 
                          } 
                          sb.追加(交换); 
                      } 
                  } 
    
                  if (sb.Length == maxlen) 
                      休息; 
              } 
              返回 sb.ToString(); 
          } 
    
          静态字符串 ConvertEdgeCases(char c, bool toLower) 
          { 
              字符串交换=空; 
              开关(c) 
              { 
                  案例“我”: 
                      交换=“我”; 
                      休息; 
                  案例“l”: 
                      交换=“l”; 
                      休息; 
                  案例“Ł”: 
                      交换 = 降低?   “二”; 
                      休息; 
                  案例“D”: 
                      交换=“d”; 
                      休息; 
                  案例“ß”: 
                      交换=“SS”; 
                      休息; 
                  案例“ø”: 
                      交换=“o”; 
                      休息; 
                  案例“Þ”: 
                      交换=“第”; 
                      休息; 
              } 
              返回掉期; 
          } 
      } 
      

有关更多详细信息、单元测试以及 FacebookURL 方案比 Stack Overflows 更聪明,我有一个 我的博客上的扩展版本

Here is my version of Jeff's code. I've made the following changes:

  • The hyphens were appended in such a way that one could be added, and then need removing as it was the last character in the string. That is, we never want “my-slug-”. This means an extra string allocation to remove it on this edge case. I’ve worked around this by delay-hyphening. If you compare my code to Jeff’s the logic for this is easy to follow.
  • His approach is purely lookup based and missed a lot of characters I found in examples while researching on Stack Overflow. To counter this, I first peform a normalisation pass (AKA collation mentioned in Meta Stack Overflow question Non US-ASCII characters dropped from full (profile) URL), and then ignore any characters outside the acceptable ranges. This works most of the time...
  • ... For when it doesn’t I’ve also had to add a lookup table. As mentioned above, some characters don’t map to a low ASCII value when normalised. Rather than drop these I’ve got a manual list of exceptions that is doubtless full of holes, but it is better than nothing. The normalisation code was inspired by Jon Hanna’s great post in Stack Overflow question How can I remove accents on a string?.
  • The case conversion is now also optional.

    public static class Slug
    {
        public static string Create(bool toLower, params string[] values)
        {
            return Create(toLower, String.Join("-", values));
        }
    
        /// <summary>
        /// Creates a slug.
        /// References:
        /// http://www.unicode.org/reports/tr15/tr15-34.html
        /// https://meta.stackexchange.com/questions/7435/non-us-ascii-characters-dropped-from-full-profile-url/7696#7696
        /// https://stackoverflow.com/questions/25259/how-do-you-include-a-webpage-title-as-part-of-a-webpage-url/25486#25486
        /// https://stackoverflow.com/questions/3769457/how-can-i-remove-accents-on-a-string
        /// </summary>
        /// <param name="toLower"></param>
        /// <param name="normalised"></param>
        /// <returns></returns>
        public static string Create(bool toLower, string value)
        {
            if (value == null)
                return "";
    
            var normalised = value.Normalize(NormalizationForm.FormKD);
    
            const int maxlen = 80;
            int len = normalised.Length;
            bool prevDash = false;
            var sb = new StringBuilder(len);
            char c;
    
            for (int i = 0; i < len; i++)
            {
                c = normalised[i];
                if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
                {
                    if (prevDash)
                    {
                        sb.Append('-');
                        prevDash = false;
                    }
                    sb.Append(c);
                }
                else if (c >= 'A' && c <= 'Z')
                {
                    if (prevDash)
                    {
                        sb.Append('-');
                        prevDash = false;
                    }
                    // Tricky way to convert to lowercase
                    if (toLower)
                        sb.Append((char)(c | 32));
                    else
                        sb.Append(c);
                }
                else if (c == ' ' || c == ',' || c == '.' || c == '/' || c == '\\' || c == '-' || c == '_' || c == '=')
                {
                    if (!prevDash && sb.Length > 0)
                    {
                        prevDash = true;
                    }
                }
                else
                {
                    string swap = ConvertEdgeCases(c, toLower);
    
                    if (swap != null)
                    {
                        if (prevDash)
                        {
                            sb.Append('-');
                            prevDash = false;
                        }
                        sb.Append(swap);
                    }
                }
    
                if (sb.Length == maxlen)
                    break;
            }
            return sb.ToString();
        }
    
        static string ConvertEdgeCases(char c, bool toLower)
        {
            string swap = null;
            switch (c)
            {
                case 'ı':
                    swap = "i";
                    break;
                case 'ł':
                    swap = "l";
                    break;
                case 'Ł':
                    swap = toLower ? "l" : "L";
                    break;
                case 'đ':
                    swap = "d";
                    break;
                case 'ß':
                    swap = "ss";
                    break;
                case 'ø':
                    swap = "o";
                    break;
                case 'Þ':
                    swap = "th";
                    break;
            }
            return swap;
        }
    }
    

For more details, the unit tests, and an explanation of why Facebook's URL scheme is a little smarter than Stack Overflows, I've got an expanded version of this on my blog.

耳根太软 2024-07-11 12:06:38

T-SQL 实现,改编自 dbo.UrlEncode

CREATE FUNCTION dbo.Slug(@string varchar(1024))
RETURNS varchar(3072)
AS
BEGIN
    DECLARE @count int, @c char(1), @i int, @slug varchar(3072)

    SET @string = replace(lower(ltrim(rtrim(@string))),' ','-')

    SET @count = Len(@string)
    SET @i = 1
    SET @slug = ''

    WHILE (@i <= @count)
    BEGIN
        SET @c = substring(@string, @i, 1)

        IF @c LIKE '[a-z0-9--]'
            SET @slug = @slug + @c

        SET @i = @i +1
    END

    RETURN @slug
END

T-SQL implementation, adapted from dbo.UrlEncode:

CREATE FUNCTION dbo.Slug(@string varchar(1024))
RETURNS varchar(3072)
AS
BEGIN
    DECLARE @count int, @c char(1), @i int, @slug varchar(3072)

    SET @string = replace(lower(ltrim(rtrim(@string))),' ','-')

    SET @count = Len(@string)
    SET @i = 1
    SET @slug = ''

    WHILE (@i <= @count)
    BEGIN
        SET @c = substring(@string, @i, 1)

        IF @c LIKE '[a-z0-9--]'
            SET @slug = @slug + @c

        SET @i = @i +1
    END

    RETURN @slug
END
傾城如夢未必闌珊 2024-07-11 12:06:38

我对 Ruby 或 Rails 不太了解,但在 Perl 中,这就是我会做的:

my $title = "How do you change a title to be part of the url like Stackoverflow?";

my $url = lc $title;   # Change to lower case and copy to URL.
$url =~ s/^\s+//g;     # Remove leading spaces.
$url =~ s/\s+$//g;     # Remove trailing spaces.
$url =~ s/\s+/\-/g;    # Change one or more spaces to single hyphen.
$url =~ s/[^\w\-]//g;  # Remove any non-word characters.

print "$title\n$url\n";

我只是做了一个快速测试,它似乎有效。 希望这相对容易翻译成 Ruby。

I don't much about Ruby or Rails, but in Perl, this is what I would do:

my $title = "How do you change a title to be part of the url like Stackoverflow?";

my $url = lc $title;   # Change to lower case and copy to URL.
$url =~ s/^\s+//g;     # Remove leading spaces.
$url =~ s/\s+$//g;     # Remove trailing spaces.
$url =~ s/\s+/\-/g;    # Change one or more spaces to single hyphen.
$url =~ s/[^\w\-]//g;  # Remove any non-word characters.

print "$title\n$url\n";

I just did a quick test and it seems to work. Hopefully this is relatively easy to translate to Ruby.

方圜几里 2024-07-11 12:06:38

我不熟悉 Ruby on Rails,但以下是(未经测试的)PHP 代码。 如果您发现它有用,您可以很快地将其转换为 Ruby on Rails。

$sURL = "This is a title to convert to URL-format. It has 1 number in it!";
// To lower-case
$sURL = strtolower($sURL);

// Replace all non-word characters with spaces
$sURL = preg_replace("/\W+/", " ", $sURL);

// Remove trailing spaces (so we won't end with a separator)
$sURL = trim($sURL);

// Replace spaces with separators (hyphens)
$sURL = str_replace(" ", "-", $sURL);

echo $sURL;
// outputs: this-is-a-title-to-convert-to-url-format-it-has-1-number-in-it

我希望这有帮助。

I am not familiar with Ruby on Rails, but the following is (untested) PHP code. You can probably translate this very quickly to Ruby on Rails if you find it useful.

$sURL = "This is a title to convert to URL-format. It has 1 number in it!";
// To lower-case
$sURL = strtolower($sURL);

// Replace all non-word characters with spaces
$sURL = preg_replace("/\W+/", " ", $sURL);

// Remove trailing spaces (so we won't end with a separator)
$sURL = trim($sURL);

// Replace spaces with separators (hyphens)
$sURL = str_replace(" ", "-", $sURL);

echo $sURL;
// outputs: this-is-a-title-to-convert-to-url-format-it-has-1-number-in-it

I hope this helps.

北笙凉宸 2024-07-11 12:06:38

我们是这样做的。 请注意,边缘条件可能比您第一眼意识到的要多。

这是第二个版本,性能提高了 5 倍(是的,我对它进行了基准测试)。 我想我应该优化它,因为这个函数每页可以调用数百次。

/// <summary>
/// Produces optional, URL-friendly version of a title, "like-this-one". 
/// hand-tuned for speed, reflects performance refactoring contributed
/// by John Gietzen (user otac0n) 
/// </summary>
public static string URLFriendly(string title)
{
    if (title == null) return "";

    const int maxlen = 80;
    int len = title.Length;
    bool prevdash = false;
    var sb = new StringBuilder(len);
    char c;

    for (int i = 0; i < len; i++)
    {
        c = title[i];
        if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
        {
            sb.Append(c);
            prevdash = false;
        }
        else if (c >= 'A' && c <= 'Z')
        {
            // tricky way to convert to lowercase
            sb.Append((char)(c | 32));
            prevdash = false;
        }
        else if (c == ' ' || c == ',' || c == '.' || c == '/' || 
            c == '\\' || c == '-' || c == '_' || c == '=')
        {
            if (!prevdash && sb.Length > 0)
            {
                sb.Append('-');
                prevdash = true;
            }
        }
        else if ((int)c >= 128)
        {
            int prevlen = sb.Length;
            sb.Append(RemapInternationalCharToAscii(c));
            if (prevlen != sb.Length) prevdash = false;
        }
        if (i == maxlen) break;
    }

    if (prevdash)
        return sb.ToString().Substring(0, sb.Length - 1);
    else
        return sb.ToString();
}

要查看此替换的代码的先前版本(但在功能上等效,并且速度快 5 倍),请查看本文的修订历史记录(单击日期链接)。

此外,此处还可以找到 RemapInternationalCharToAscii 方法源代码。

Here's how we do it. Note that there are probably more edge conditions than you realize at first glance.

This is the second version, unrolled for 5x more performance (and yes, I benchmarked it). I figured I'd optimize it because this function can be called hundreds of times per page.

/// <summary>
/// Produces optional, URL-friendly version of a title, "like-this-one". 
/// hand-tuned for speed, reflects performance refactoring contributed
/// by John Gietzen (user otac0n) 
/// </summary>
public static string URLFriendly(string title)
{
    if (title == null) return "";

    const int maxlen = 80;
    int len = title.Length;
    bool prevdash = false;
    var sb = new StringBuilder(len);
    char c;

    for (int i = 0; i < len; i++)
    {
        c = title[i];
        if ((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'))
        {
            sb.Append(c);
            prevdash = false;
        }
        else if (c >= 'A' && c <= 'Z')
        {
            // tricky way to convert to lowercase
            sb.Append((char)(c | 32));
            prevdash = false;
        }
        else if (c == ' ' || c == ',' || c == '.' || c == '/' || 
            c == '\\' || c == '-' || c == '_' || c == '=')
        {
            if (!prevdash && sb.Length > 0)
            {
                sb.Append('-');
                prevdash = true;
            }
        }
        else if ((int)c >= 128)
        {
            int prevlen = sb.Length;
            sb.Append(RemapInternationalCharToAscii(c));
            if (prevlen != sb.Length) prevdash = false;
        }
        if (i == maxlen) break;
    }

    if (prevdash)
        return sb.ToString().Substring(0, sb.Length - 1);
    else
        return sb.ToString();
}

To see the previous version of the code this replaced (but is functionally equivalent to, and 5x faster), view revision history of this post (click the date link).

Also, the RemapInternationalCharToAscii method source code can be found here.

月隐月明月朦胧 2024-07-11 12:06:38

我知道这是一个非常老的问题,但由于现在大多数浏览器支持unicode url,我在XRegex中找到了一个很好的解决方案,可以将除字母之外的所有内容(在所有语言中都转换为“-”) )。

这可以用多种编程语言来完成。

模式是 \\p{^L}+,然后您只需使用它来将所有非字母替换为“-”。

Node.js 中使用 xregex 模块的工作示例。

var text = 'This ! can @ have # several $ letters % from different languages such as עברית or Español';

var slugRegEx = XRegExp('((?!\\d)\\p{^L})+', 'g');

var slug = XRegExp.replace(text, slugRegEx, '-').toLowerCase();

console.log(slug) ==> "this-can-have-several-letters-from-different-languages-such-as-עברית-or-español"

I know it's very old question but since most of the browsers now support unicode urls I found a great solution in XRegex that converts everything except letters (in all languages to '-').

That can be done in several programming languages.

The pattern is \\p{^L}+ and then you just need to use it to replace all non letters to '-'.

Working example in node.js with xregex module.

var text = 'This ! can @ have # several $ letters % from different languages such as עברית or Español';

var slugRegEx = XRegExp('((?!\\d)\\p{^L})+', 'g');

var slug = XRegExp.replace(text, slugRegEx, '-').toLowerCase();

console.log(slug) ==> "this-can-have-several-letters-from-different-languages-such-as-עברית-or-español"
花之痕靓丽 2024-07-11 12:06:38

您需要设置一个自定义路由,将 URL 指向将处理它的控制器。 由于您使用的是 Ruby on Rails,因此这里是使用其路由引擎的简介

在 Ruby 中,您将需要一个您已经知道的正则表达式,下面是要使用的正则表达式:

def permalink_for(str)
    str.gsub(/[^\w\/]|[!\(\)\.]+/, ' ').strip.downcase.gsub(/\ +/, '-')
end

You will want to setup a custom route to point the URL to the controller that will handle it. Since you are using Ruby on Rails, here is an introduction in using their routing engine.

In Ruby, you will need a regular expression like you already know and here is the regular expression to use:

def permalink_for(str)
    str.gsub(/[^\w\/]|[!\(\)\.]+/, ' ').strip.downcase.gsub(/\ +/, '-')
end
因为看清所以看轻 2024-07-11 12:06:38

您还可以使用此 JavaScript 函数以形式生成 slug(该函数基于/复制自 Django):

function makeSlug(urlString, filter) {
    // Changes, e.g., "Petty theft" to "petty_theft".
    // Remove all these words from the string before URLifying

    if(filter) {
        removelist = ["a", "an", "as", "at", "before", "but", "by", "for", "from",
        "is", "in", "into", "like", "of", "off", "on", "onto", "per",
        "since", "than", "the", "this", "that", "to", "up", "via", "het", "de", "een", "en",
        "with"];
    }
    else {
        removelist = [];
    }
    s = urlString;
    r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
    s = s.replace(r, '');
    s = s.replace(/[^-\w\s]/g, ''); // Remove unneeded characters
    s = s.replace(/^\s+|\s+$/g, ''); // Trim leading/trailing spaces
    s = s.replace(/[-\s]+/g, '-'); // Convert spaces to hyphens
    s = s.toLowerCase(); // Convert to lowercase
    return s; // Trim to first num_chars characters
}

You can also use this JavaScript function for in-form generation of the slug's (this one is based on/copied from Django):

function makeSlug(urlString, filter) {
    // Changes, e.g., "Petty theft" to "petty_theft".
    // Remove all these words from the string before URLifying

    if(filter) {
        removelist = ["a", "an", "as", "at", "before", "but", "by", "for", "from",
        "is", "in", "into", "like", "of", "off", "on", "onto", "per",
        "since", "than", "the", "this", "that", "to", "up", "via", "het", "de", "een", "en",
        "with"];
    }
    else {
        removelist = [];
    }
    s = urlString;
    r = new RegExp('\\b(' + removelist.join('|') + ')\\b', 'gi');
    s = s.replace(r, '');
    s = s.replace(/[^-\w\s]/g, ''); // Remove unneeded characters
    s = s.replace(/^\s+|\s+$/g, ''); // Trim leading/trailing spaces
    s = s.replace(/[-\s]+/g, '-'); // Convert spaces to hyphens
    s = s.toLowerCase(); // Convert to lowercase
    return s; // Trim to first num_chars characters
}
独留℉清风醉 2024-07-11 12:06:38

为了更好地衡量,这里是 WordPress 中的 PHP 函数,它可以做到这一点...我认为 WordPress 是使用花哨链接的更流行的平台之一。

    function sanitize_title_with_dashes($title) {
            $title = strip_tags($title);
            // Preserve escaped octets.
            $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
            // Remove percent signs that are not part of an octet.
            $title = str_replace('%', '', $title);
            // Restore octets.
            $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
            $title = remove_accents($title);
            if (seems_utf8($title)) {
                    if (function_exists('mb_strtolower')) {
                            $title = mb_strtolower($title, 'UTF-8');
                    }
                    $title = utf8_uri_encode($title, 200);
            }
            $title = strtolower($title);
            $title = preg_replace('/&.+?;/', '', $title); // kill entities
            $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
            $title = preg_replace('/\s+/', '-', $title);
            $title = preg_replace('|-+|', '-', $title);
            $title = trim($title, '-');
            return $title;
    }

该函数以及一些支持函数可以在 wp-includes/formatting.php 中找到。

For good measure, here's the PHP function in WordPress that does it... I'd think that WordPress is one of the more popular platforms that uses fancy links.

    function sanitize_title_with_dashes($title) {
            $title = strip_tags($title);
            // Preserve escaped octets.
            $title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title);
            // Remove percent signs that are not part of an octet.
            $title = str_replace('%', '', $title);
            // Restore octets.
            $title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title);
            $title = remove_accents($title);
            if (seems_utf8($title)) {
                    if (function_exists('mb_strtolower')) {
                            $title = mb_strtolower($title, 'UTF-8');
                    }
                    $title = utf8_uri_encode($title, 200);
            }
            $title = strtolower($title);
            $title = preg_replace('/&.+?;/', '', $title); // kill entities
            $title = preg_replace('/[^%a-z0-9 _-]/', '', $title);
            $title = preg_replace('/\s+/', '-', $title);
            $title = preg_replace('|-+|', '-', $title);
            $title = trim($title, '-');
            return $title;
    }

This function as well as some of the supporting functions can be found in wp-includes/formatting.php.

最近可好 2024-07-11 12:06:38

如果您使用 Rails Edge,则可以依赖 Inflector.parametrize - 这是文档中的示例:

  class Person
    def to_param
      "#{id}-#{name.parameterize}"
    end
  end

  @person = Person.find(1)
  # => #<Person id: 1, name: "Donald E. Knuth">

  <%= link_to(@person.name, person_path(@person)) %>
  # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>

此外,如果您需要处理更多外来字符,例如以前版本的 Rails 中的重音符号 (éphémère),您可以混合使用 PermalinkFu变音符号

DiacriticsFu::escape("éphémère")
=> "ephemere"

DiacriticsFu::escape("räksmörgås")
=> "raksmorgas"

If you are using Rails edge, you can rely on Inflector.parametrize - here's the example from the documentation:

  class Person
    def to_param
      "#{id}-#{name.parameterize}"
    end
  end

  @person = Person.find(1)
  # => #<Person id: 1, name: "Donald E. Knuth">

  <%= link_to(@person.name, person_path(@person)) %>
  # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>

Also if you need to handle more exotic characters such as accents (éphémère) in previous version of Rails, you can use a mixture of PermalinkFu and DiacriticsFu:

DiacriticsFu::escape("éphémère")
=> "ephemere"

DiacriticsFu::escape("räksmörgås")
=> "raksmorgas"
瘫痪情歌 2024-07-11 12:06:38

假设您的模型类具有 title 属性,您可以简单地重写模型中的 to_param 方法,如下所示:

def to_param
  title.downcase.gsub(/ /, '-')
end

这一集 Railscast 包含所有详细信息。 您还可以使用以下命令确保标题仅包含有效字符:

validates_format_of :title, :with => /^[a-z0-9-]+$/,
                    :message => 'can only contain letters, numbers and hyphens'

Assuming that your model class has a title attribute, you can simply override the to_param method within the model, like this:

def to_param
  title.downcase.gsub(/ /, '-')
end

This Railscast episode has all the details. You can also ensure that the title only contains valid characters using this:

validates_format_of :title, :with => /^[a-z0-9-]+$/,
                    :message => 'can only contain letters, numbers and hyphens'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文