用于制作 slug 的 PHP 函数(URL 字符串)
我想要一个从 Unicode 字符串创建 slugs 的函数,例如 gen_slug('Andrés Cortez')
应该返回 andres-cortez
。我该怎么做呢?
I want to have a function to create slugs from Unicode strings, e.g. gen_slug('Andrés Cortez')
should return andres-cortez
. How should I do that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
在我的本地主机上一切正常,但在服务器上它帮助我在“mb_strtolower”处“set_locale”和“utf-8”。
测试
On my localhost everything was ok, but on server it helped me “set_locale” and “utf-8” at “mb_strtolower”.
Test
对于标准的字母数字英语 - 没什么复杂的。
For standard alphanumeric english - nothing complicated.
不确定它是否适用于所有情况,但我从 Laravel Str 类 并添加了
iconv('utf-8', 'us-ascii//TRANSLIT', $title)
来处理重音符号,而无需使用voku/portable-ascii
它似乎工作得很好对于我的用例:Not sure it works for every cases but i took the slug method from Laravel Str class and added the
iconv('utf-8', 'us-ascii//TRANSLIT', $title)
thing to treat accents without the need to usevoku/portable-ascii
it seems to work pretty well for my use cases :在 Laravel 8 中是:
In Laravel 8 it's:
在我看来,使用 Transliterator::transliterate 的选项是它可以提供最佳结果(从任何字母表转换),它非常可定制,代码很短并且不需要外部库。
但是使用 hdogan 的有价值的答案中的规则,在某些情况下会返回不正确的结果:
年2020-2024
->years-20202024
,预期结果:years-2020-2024
。The_word
->theword
,预期结果:the-word
。my string
->-my-string-
,预期结果:my-string
。Test ©
->test-©
,预期结果:test-c
或test
(就我而言,我更喜欢第一个)。Nº 1
->n°-1
,预期结果:no-1
或n-1
。根据该回复,我对规则进行了一些更改:
更改说明:
使用 preg_replace 是因为某些西里尔字符(例如 г)由于某种原因未标准化,而我们只想要带有连字符的字母数字结果,因此我们删除了这几个例外。我还没有设法在规则中规范它们,如果有人知道是否可能,我们将不胜感激。
比较演示
ICU 文档 / Unicode 类别
In my opinion, the option to use Transliterator::transliterate is the one that gives the best results (converting from any alphabet), it's very customizable, results in a short code and doesn't need external libs.
But using the rules from hdogan's valuable answer, it returns incorrect results in some cases:
Years 2020-2024
->years-20202024
, expected result:years-2020-2024
.The_word
->theword
, expected result:the-word
.my string
->-my-string-
, expected result:my-string
.Test ©
->test-©
, expected result:test-c
ortest
(in my case, I prefer the first one).Nº 1
->nº-1
, expected result:no-1
orn-1
.Based on that response, I have made some changes to the rules:
Explanation of the changes:
The use of preg_replace is because some Cyrillic characters (e.g. ҳ) are not normalized for some reason, and we want only alphanumeric result with hyphens, so we remove these few exceptions. I have not managed to normalize them in the rules, if anyone knows if it's possible, the collaboration is appreciated.
Comparison demo
ICU Documentation / Unicode categories
自 gTLD 和 IDN 的使用越来越广泛,我不明白为什么 URL 不应包含 Andrés。
只需对您想要的 $URL 进行 rawurlencode 即可。大多数浏览器在 URL 中显示 UTF-8 字符(可能不是某些古老的 IE6),如果需要用于广告目的或只是将其写入广告,则可以使用 bit.ly / goo.gl 在俄语和阿拉伯语等情况下使其简短就像用户将它们写在浏览器 URL 上一样。
唯一的区别是空格“ ”,如果您不想允许这些空格,则用“-”和“/”替换它们可能是个好主意。
编码后的 URL
网址为 http://www.hurtta.com/RU/Продукты/
Since gTLDs and IDNs are becoming more and more used I cannot see why URL shouldn't contain Andrés.
Just rawurlencode $URL you want instead. Most browsers show UTF-8 characters in URLs (not some ancient IE6 maybe) and bit.ly / goo.gl can be used to make it short in cases like Russian and Arabic if need may be for ad purposes or just write them in ads like user would write them on browser URL.
Only difference is spaces " " it might be good idea to replace them with "-" and "/" if you don't want to allow those.
Url as encoded
http://www.hurtta.com/RU/%D0%9F%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B/
Url as written http://www.hurtta.com/RU/Продукты/
因为我在这里看到了很多方法,但我为自己找到了一个最简单的方法。也许它会对某人有所帮助。
Since I've Seen a lot of methods here but I've found a simplest method for myself.Maybe it will help someone.
这是制作鼻涕虫的简短而简单的解决方案
Here is the short and easy solution for making slug
我正在使用这个函数并且它工作正常:
I'm using this function and it works fine:
尝试使用 urldecode 来获取解码后的字符串:
Try to using
urldecode
to get the decoded string:对我来说,这个变体是完美的,还将
&
更改为and
。这是代码:For me this variant is perfect, also it change
&
toand
. Here is code:与其进行冗长的替换,不如尝试这个:
这是基于 Symfony 的 Jobeet 教程中的一个。
Instead of a lengthy replace, try this one:
This was based off the one in Symfony's Jobeet tutorial.
更新
由于这个答案引起了一些关注,我添加了一些解释。
所提供的解决方案将基本上取代除 AZ、az、0-9 和 & 之外的所有内容。 -(连字符)与 -(连字符)。因此,它无法与其他 unicode 字符(这些字符是 URL slug/字符串的有效字符)一起正常工作。一种常见的情况是输入字符串包含非英语字符。
仅当您确信输入字符串不会包含您可能希望成为输出/slug 一部分的 unicode 字符时,才使用此解决方案。
例如。 “नारी शक्ति”将变成“----------”(全是连字符)而不是“नारी-शक्ति”(有效的 URL slug)。
回答
Update
Since this answer is getting some attention, I'm adding some explanation.
The solution provided will essentially replace everything except A-Z, a-z, 0-9, & - (hyphen) with - (hyphen). So, it won't work properly with other unicode characters (which are valid characters for a URL slug/string). A common scenario is when the input string contains non-English characters.
Only use this solution if you're confident that the input string won't have unicode characters which you might want to be a part of output/slug.
Eg. "नारी शक्ति" will become "----------" (all hyphens) instead of "नारी-शक्ति" (valid URL slug).
Answer
如果您安装了 intl 扩展,则可以使用 Transliterator::transliterate 函数可以轻松创建 slug。
演示
请注意,该解决方案适用于任何字母表,并且高度灵活。
If you have intl extension installed, you can use Transliterator::transliterate function to create a slug easily.
demo
Note that this solution works whatever the alphabet and is highly flexible.
注意:我从 WordPress 中获取了这个并且它有效!
像这样使用它:
代码
Note: I have taken this from wordpress and it works!!
Use it like this:
Code
使用许多高级开发人员支持的现有解决方案始终是一个好主意。最流行的是 https://github.com/cocur/slugify。首先,它支持多种语言,并且正在更新。
如果您不想使用整个包,可以复制您需要的部分。
It is always a good idea to use existing solutions that are being supported by a lot of high-level developers. The most popular one is https://github.com/cocur/slugify. First of all, it supports more than one language, and it is being updated.
If you do not want to use the whole package, you can copy the part that you need.
这是另一种,例如“带有奇怪字符的标题 ééé AX Z”变为“带有奇怪字符的标题-eee-axz”。
Here is an other one, for example " Title with strange characters ééé A X Z" becomes "title-with-strange-characters-eee-a-x-z".
这里已经有很多答案了,所以我几乎不想添加另一个答案,但是没有一个函数可以完成我需要的一切。
对我来说最好的基础是第 3 号函数,其中比较了它们的速度。我添加/修复了一些替换,因此
'
被删除,.
被-
替换,α
被替换为a
、ẞ
替换为b
、Ł
(及类似)替换为L 代替
K
,€
和$
符号替换为eur
和usd 分别(根据需要添加更多)。
您可以选择添加
'&' => '-and-'
,但 SEO 建议不要使用连词 (#8),所以我将其保留在我的用例中。 (不过,此函数不会从字符串中删除现有的and
和or
)我还添加了一行代码来修复双破折号在我想出的这个奇怪的字符串中,以及一个可选参数来限制 slug 的长度。
代码
输出
注释
它也适用于 OP 将
'Andrés Cortez'
转换为'andres-cortez'
的情况以及我在该线程中找到的所有其他示例,除了这个字符我无法理解:There are already many answers here, so I almost don't want to add another one, but none of the functions did everything I needed.
The best basis for me was function number 3 where their speed was compared. I added/fixed some replacements so
'
is just deleted,.
is replaced by-
,α
is replaced bya
,ẞ
is replaced byb
,Ł
(and similar) is replaced byL
instead ofK
, and€
and$
signs are replaced witheur
andusd
respectively (add more on necessity).Optionally you can add
'&' => '-and-'
, but SEO advises against usage of conjunctions (#8), so I left it out for my use-case. (this function doesn't strip existingand
s andor
s from the string though)I also added a line of code to fix double dash in this weird string I came up with, as well as an optional parameter to limit slug's length.
Code
Output
Note
It also works for OP's case for converting
'Andrés Cortez'
to'andres-cortez'
and all other examples I have found in this thread, except this character which is beyond me:????
.I'll be thrilled to know about bugs you found (hopefully accompanied with suggestions).
@Imran Omar Bukhsh 代码的更新版本(来自最新的 Wordpress (4.0) 分支):
在线查看示例。
An updated version of @Imran Omar Bukhsh code (from the latest Wordpress (4.0) branch):
View online example.
不要为此使用 preg_replace。有一个专门为该任务构建的 php 函数:strtr()
http://php.net/manual/en/function.strtr.php
取自上面链接中的评论(我自己测试过;它有效:
Don't use preg_replace for this. There's a php function built just for the task: strtr()
http://php.net/manual/en/function.strtr.php
Taken from the comments in the above link (and I tested it myself; it works:
我不知道该使用哪一个,所以我在 phptester.net 上做了一个快速的工作台
开始:
输出结果:
需要进一步测试。
编辑:更少的迭代测试
开始:
输出结果:
I didn't know which one to use so I made a quick bench on phptester.net
Beginning :
Output results :
Further tests needed.
Edit : less iterations test
Beginning :
Output results :
我正在使用:
唯一的后备方案是西里尔字符不会被转换,我现在正在寻找对于每个西里尔字符不长 str_replace 的解决方案。
I am using:
Only fallback is that Cyrillic characters will not be converted, and I am searching now for solution that is not long str_replace for every single Cyrillic character.
我认为最优雅的方式是使用 Behat\Transliterator\Transliterator。
我需要通过你的类扩展这个类,因为它是一个抽象,有些像这样:
然后,只需使用它:
当然你也应该把这些东西放在你的作曲家中。
更多信息请点击这里 https://github.com/Behat/Transliterator
The most elegant way I think is using a Behat\Transliterator\Transliterator.
I need to extends this class by your class because it is an Abstract, some like this:
And then, just use it:
Of course you should put this things in your composer as well.
More info here https://github.com/Behat/Transliterator
这或许也是一种方法。受到这些链接的启发 专家交换和alinalexander
This may be a way to do it too. Inspired from these links Experts-exchange and alinalexander
用例:
输出:
bu-metinde-cosgui-karakter-kullanilamaz
Use case:
Output:
bu-metinde-c-o-s-g-u-i-karakter-kullanilamaz
您可以查看
Normalizer::normalize()
,You could have a look at
Normalizer::normalize()
, see here. It just needs to load the intl module for PHP使用 Core 中已经实现的东西怎么样?
或者核心 url/ url 重写方法之一..
What about using something that is already implemented in Core?
Or one of the core url/ url rewrite methods..
我根据 Maerlyn 的回复写了这篇文章。无论页面上的字符编码如何,此功能都将起作用。它也不会将单引号变成破折号:)
I wrote this based on Maerlyn's response. This function will work regardless of the character encoding on the page. It also won't turn single quotes in to dashes :)
这里有一个很好的解决方案,它也可以处理特殊字符。
Texto Fantástico => texto-fantastico
作者:Natxet
There's a good solution here that deals with special characters as well.
Texto Fantástico => texto-fantastico
Author: Natxet