将字符串转换为仅包含单连字符分隔符的 slug

发布于 2024-09-05 23:40:52 字数 211 浏览 5 评论 0 原文

我想清理 URL 中的字符串,所以这就是我基本上需要的:

  1. 除了字母数字字符、空格和破折号之外,所有内容都必须删除。
  2. 空格应转换为破折号。

例如。

This, is the URL!

必须返回

this-is-the-url

I would like to sanitize a string in to a URL so this is what I basically need:

  1. Everything must be removed except alphanumeric characters and spaces and dashed.
  2. Spaces should be converter into dashes.

Eg.

This, is the URL!

must return

this-is-the-url

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

怪异←思 2024-09-12 23:40:53

以下内容将用破折号替换空格。

$str = str_replace(' ', '-', $str);

然后以下语句将删除除字母数字字符和破折号之外的所有内容。 (没有空格,因为在上一步中我们已将它们替换为破折号。

// Char representation     0 -  9   A-   Z   a-   z  -    
$str = preg_replace('/[^\x30-\x39\x41-\x5A\x61-\x7A\x2D]/', '', $str);

这相当于

$str = preg_replace('/[^0-9A-Za-z-]+/', '', $str);

仅供参考:要从字符串中删除所有特殊字符,请使用

$str = preg_replace('/[^\x20-\x7E]/', '', $str); 

\x20 表示 Acsii 字符开头的空格,而 \x7E 表示波浪号。根据维基百科 https://en.wikipedia.org/wiki/ASCII#Printable_characters

仅供参考:查看十六进制列中 20-7E 区间的

可打印字符
代码 20hex 到 7Ehex 称为可打印字符,代表字母、数字、标点符号和一些杂项符号。总共有 95 个可打印字符。

The following will replace spaces with dashes.

$str = str_replace(' ', '-', $str);

Then the following statement will remove everything except alphanumeric characters and dashed. (didn't have spaces because in previous step we had replaced them with dashes.

// Char representation     0 -  9   A-   Z   a-   z  -    
$str = preg_replace('/[^\x30-\x39\x41-\x5A\x61-\x7A\x2D]/', '', $str);

Which is equivalent to

$str = preg_replace('/[^0-9A-Za-z-]+/', '', $str);

FYI: To remove all special characters from a string use

$str = preg_replace('/[^\x20-\x7E]/', '', $str); 

\x20 is hexadecimal for space that is start of Acsii charecter and \x7E is tilde. As accordingly to wikipedia https://en.wikipedia.org/wiki/ASCII#Printable_characters

FYI: look into the Hex Column for the interval 20-7E

Printable characters
Codes 20hex to 7Ehex, known as the printable characters, represent letters, digits, punctuation marks, and a few miscellaneous symbols. There are 95 printable characters in total.

谁的年少不轻狂 2024-09-12 23:40:52
function slug($z){
    $z = strtolower($z);
    $z = preg_replace('/[^a-z0-9 -]+/', '', $z);
    $z = str_replace(' ', '-', $z);
    return trim($z, '-');
}
function slug($z){
    $z = strtolower($z);
    $z = preg_replace('/[^a-z0-9 -]+/', '', $z);
    $z = str_replace(' ', '-', $z);
    return trim($z, '-');
}
南风起 2024-09-12 23:40:52

首先去除不需要的字符

$new_string = preg_replace("/[^a-zA-Z0-9\s]/", "", $string);

然后更改 unserscore 的空格

$url = preg_replace('/\s/', '-', $new_string);

最后对其进行编码以备使用

$new_url = urlencode($url);

First strip unwanted characters

$new_string = preg_replace("/[^a-zA-Z0-9\s]/", "", $string);

Then changes spaces for unserscores

$url = preg_replace('/\s/', '-', $new_string);

Finally encode it ready for use

$new_url = urlencode($url);
千仐 2024-09-12 23:40:52

OP 没有明确描述 slug 的所有属性,但这就是我从意图中收集的内容。

我对完美、有效、压缩的 slug 的解释与这篇文章一致:https://wordpress.stackexchange.com/questions/149191/slug-formatting-acceptable-characters#:~:text=然而% 2C%20we%20可以%20总结%20the,或%20end%20与%20a%20连字符

我发现之前发布的答案都没有一致地实现这一点(而且我什至没有扩展问题的范围以包括多字节字符)。

  1. 将所有字符转换为小写
  2. 将一个或多个非字母数字字符的所有序列替换为单个连字符。
  3. 修剪字符串中的前导和尾随连字符。

我建议使用以下一行代码,它不需要声明一次性变量:

return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($string)), '-');

在我的演示链接中未显示,这里是更好地处理多字节字符串的尝试,尽管它不能完全适应卡西米尔的答案那么多的场景。

return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower(iconv('utf-8', 'ascii//translit', $string))), '-');

我还准备了一个演示,突出显示我认为其他答案中的不准确之处。 (演示

'This, is - - the URL!' input
'this-is-the-url'       expected

'this-is-----the-url'   SilentGhost
'this-is-the-url'       mario
'This-is---the-URL'     Rooneyl
'This-is-the-URL'       AbhishekGoel
'This, is - - the URL!' HelloHack
'This, is - - the URL!' DenisMatafonov
'This,-is-----the-URL!' AdeelRazaAzeemi
'this-is-the-url'       mickmackusa

---
'Mork & Mindy'      input
'mork-mindy'        expected

'mork--mindy'       SilentGhost
'mork-mindy'        mario
'Mork--Mindy'       Rooneyl
'Mork-Mindy'        AbhishekGoel
'Mork & Mindy'  HelloHack
'Mork & Mindy'      DenisMatafonov
'Mork-&-Mindy'      AdeelRazaAzeemi
'mork-mindy'        mickmackusa

---
'What the_underscore ?!?'   input
'what-the-underscore'       expected

'what-theunderscore'        SilentGhost
'what-the_underscore'       mario
'What-theunderscore-'       Rooneyl
'What-theunderscore-'       AbhishekGoel
'What the_underscore ?!?'   HelloHack
'What the_underscore ?!?'   DenisMatafonov
'What-the_underscore-?!?'   AdeelRazaAzeemi
'what-the-underscore'       mickmackusa

The OP is not explicitly describing all of the attributes of a slug, but this is what I am gathering from the intent.

My interpretation of a perfect, valid, condensed slug aligns with this post: https://wordpress.stackexchange.com/questions/149191/slug-formatting-acceptable-characters#:~:text=However%2C%20we%20can%20summarise%20the,or%20end%20with%20a%20hyphen.

I find none of the earlier posted answers to achieve this consistently (and I'm not even stretching the scope of the question to include multi-byte characters).

  1. convert all characters to lowercase
  2. replace all sequences of one or more non-alphanumeric characters to a single hyphen.
  3. trim the leading and trailing hyphens from the string.

I recommend the following one-liner which doesn't bother declaring single-use variables:

return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($string)), '-');

Not shown in my demo link, here is an attempt to better handle multibyte strings, though it doesn't quite accommodate as many scenarios as Casimir's answer.

return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower(iconv('utf-8', 'ascii//translit', $string))), '-');

I have also prepared a demonstration which highlights what I consider to be inaccuracies in the other answers. (Demo)

'This, is - - the URL!' input
'this-is-the-url'       expected

'this-is-----the-url'   SilentGhost
'this-is-the-url'       mario
'This-is---the-URL'     Rooneyl
'This-is-the-URL'       AbhishekGoel
'This, is - - the URL!' HelloHack
'This, is - - the URL!' DenisMatafonov
'This,-is-----the-URL!' AdeelRazaAzeemi
'this-is-the-url'       mickmackusa

---
'Mork & Mindy'      input
'mork-mindy'        expected

'mork--mindy'       SilentGhost
'mork-mindy'        mario
'Mork--Mindy'       Rooneyl
'Mork-Mindy'        AbhishekGoel
'Mork & Mindy'  HelloHack
'Mork & Mindy'      DenisMatafonov
'Mork-&-Mindy'      AdeelRazaAzeemi
'mork-mindy'        mickmackusa

---
'What the_underscore ?!?'   input
'what-the-underscore'       expected

'what-theunderscore'        SilentGhost
'what-the_underscore'       mario
'What-theunderscore-'       Rooneyl
'What-theunderscore-'       AbhishekGoel
'What the_underscore ?!?'   HelloHack
'What the_underscore ?!?'   DenisMatafonov
'What-the_underscore-?!?'   AdeelRazaAzeemi
'what-the-underscore'       mickmackusa
前事休说 2024-09-12 23:40:52

这将在 Unix shell 中完成(我刚刚在 MacOS 上尝试过):

$ tr -cs A-Za-z '-' < infile.txt > outfile.txt

我从 多吃壳,少吃蛋

This will do it in a Unix shell (I just tried it on my MacOS):

$ tr -cs A-Za-z '-' < infile.txt > outfile.txt

I got the idea from a blog post on More Shell, Less Egg

陈年往事 2024-09-12 23:40:52

尝试这种

 function clean($string) {
       $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
       $string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.

       return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
    }

用法:

echo clean('a|"bc!@£de^&$f g');

将输出:abcdef-g

源:https://stackoverflow.com/a/14114419 /2439715

Try This

 function clean($string) {
       $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
       $string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.

       return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
    }

Usage:

echo clean('a|"bc!@£de^&$f g');

Will output: abcdef-g

source : https://stackoverflow.com/a/14114419/2439715

淡莣 2024-09-12 23:40:52

使用 intl transliterator 是一个不错的选择,因为有了它你可以轻松处理具有一套规则的复杂案件。我添加了自定义规则来说明它如何灵活以及如何保留最大程度的有意义的信息。请随意删除它们并添加您自己的规则。

$strings = [
    'This, is - - the URL!',
    'Holmes & Yoyo',
    'L’Œil de démon',
    'How to win 1000€?',
    '€, $ & other currency symbols',
    'Und die Katze fraß alle mäuse.',
    'Белите рози на София',
    'പോണ്ടിച്ചേരി സൂര്യനു കീഴിൽ',
];

$rules = <<<'RULES'
# Transliteration
:: Any-Latin ;   :: Latin-Ascii ;

# examples of custom replacements
'&' > ' and ' ;
[^0-9][01]? { € > ' euro' ;   € > ' euros' ;
[^0-9][01]? { '

demo

不幸的是,PHP 手册关于 ICU 转换的内容完全是空的,但您可以找到有关它们的信息 此处

> ' dollar' ; '

demo

不幸的是,PHP 手册关于 ICU 转换的内容完全是空的,但您可以找到有关它们的信息 此处

> ' dollars' ; :: Null ; # slugify [^[:alnum:]&[:ascii:]]+ > '-' ; :: Lower ; # trim [$] { '-' > &Remove() ; '-' } [$] > &Remove() ; RULES; $tsl = Transliterator::createFromRules($rules, Transliterator::FORWARD); $results = array_map(fn($s) => $tsl->transliterate($s), $strings); print_r($results);

demo

不幸的是,PHP 手册关于 ICU 转换的内容完全是空的,但您可以找到有关它们的信息 此处

Using intl transliterator is a good option because with it you can easily handle complicated cases with a single set of rules. I added custom rules to illustrate how it can be flexible and how you can keep a maximum of meaningful informations. Feel free to remove them and to add your own rules.

$strings = [
    'This, is - - the URL!',
    'Holmes & Yoyo',
    'L’Œil de démon',
    'How to win 1000€?',
    '€, $ & other currency symbols',
    'Und die Katze fraß alle mäuse.',
    'Белите рози на София',
    'പോണ്ടിച്ചേരി സൂര്യനു കീഴിൽ',
];

$rules = <<<'RULES'
# Transliteration
:: Any-Latin ;   :: Latin-Ascii ;

# examples of custom replacements
'&' > ' and ' ;
[^0-9][01]? { € > ' euro' ;   € > ' euros' ;
[^0-9][01]? { '

demo

Unfortunately, the PHP manual is totally empty about ICU transformations but you can find informations about them here.

> ' dollar' ; '

demo

Unfortunately, the PHP manual is totally empty about ICU transformations but you can find informations about them here.

> ' dollars' ; :: Null ; # slugify [^[:alnum:]&[:ascii:]]+ > '-' ; :: Lower ; # trim [$] { '-' > &Remove() ; '-' } [$] > &Remove() ; RULES; $tsl = Transliterator::createFromRules($rules, Transliterator::FORWARD); $results = array_map(fn($s) => $tsl->transliterate($s), $strings); print_r($results);

demo

Unfortunately, the PHP manual is totally empty about ICU transformations but you can find informations about them here.

一个人的旅程 2024-09-12 23:40:52

之前的所有 asnwers 都处理 url,但万一有人需要清理登录字符串(例如)并将其保留为文本,请执行以下操作:

function sanitizeText($str) {
    $withSpecCharacters = htmlspecialchars($str);
    $splitted_str = str_split($str);
    $result = '';
    foreach ($splitted_str as $letter){
        if (strpos($withSpecCharacters, $letter) !== false) {
            $result .= $letter;
        }
    }
    return $result;
}

echo sanitizeText('ОРРииыфвсси ajvnsakjvnHB "&nvsp;\n" <script>alert()</script>');
//ОРРииыфвсси ajvnsakjvnHB &nvsp;\n scriptalert()/script
//No injections possible, all info at max keeped

All previous asnwers deal with url, but in case some one will need to sanitize string for login (e.g.) and keep it as text, here is you go:

function sanitizeText($str) {
    $withSpecCharacters = htmlspecialchars($str);
    $splitted_str = str_split($str);
    $result = '';
    foreach ($splitted_str as $letter){
        if (strpos($withSpecCharacters, $letter) !== false) {
            $result .= $letter;
        }
    }
    return $result;
}

echo sanitizeText('ОРРииыфвсси ajvnsakjvnHB "&nvsp;\n" <script>alert()</script>');
//ОРРииыфвсси ajvnsakjvnHB &nvsp;\n scriptalert()/script
//No injections possible, all info at max keeped
感性 2024-09-12 23:40:52
    function isolate($data) {
        
        $data = trim($data);
        $data = stripslashes($data);
        $data = htmlspecialchars($data);
        
        return $data;
    }
    function isolate($data) {
        
        $data = trim($data);
        $data = stripslashes($data);
        $data = htmlspecialchars($data);
        
        return $data;
    }
失退 2024-09-12 23:40:52

您应该使用 slugify 包,而不是重新发明轮子;)

https://github.com/cocur/slugify

You should use the slugify package and not reinvent the wheel ;)

https://github.com/cocur/slugify

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文