将字符串转换为仅包含单连字符分隔符的 slug
我想清理 URL 中的字符串,所以这就是我基本上需要的:
- 除了字母数字字符、空格和破折号之外,所有内容都必须删除。
- 空格应转换为破折号。
例如。
This, is the URL!
必须返回
this-is-the-url
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
以下内容将用破折号替换空格。
然后以下语句将删除除字母数字字符和破折号之外的所有内容。 (没有空格,因为在上一步中我们已将它们替换为破折号。
这相当于
仅供参考:要从字符串中删除所有特殊字符,请使用
\x20 表示 Acsii 字符开头的空格,而 \x7E 表示波浪号。根据维基百科 https://en.wikipedia.org/wiki/ASCII#Printable_characters
仅供参考:查看十六进制列中 20-7E 区间的
可打印字符
代码 20hex 到 7Ehex 称为可打印字符,代表字母、数字、标点符号和一些杂项符号。总共有 95 个可打印字符。
The following will replace spaces with dashes.
Then the following statement will remove everything except alphanumeric characters and dashed. (didn't have spaces because in previous step we had replaced them with dashes.
Which is equivalent to
FYI: To remove all special characters from a string use
\x20 is hexadecimal for space that is start of Acsii charecter and \x7E is tilde. As accordingly to wikipedia https://en.wikipedia.org/wiki/ASCII#Printable_characters
FYI: look into the Hex Column for the interval 20-7E
Printable characters
Codes 20hex to 7Ehex, known as the printable characters, represent letters, digits, punctuation marks, and a few miscellaneous symbols. There are 95 printable characters in total.
首先去除不需要的字符
然后更改 unserscore 的空格
最后对其进行编码以备使用
First strip unwanted characters
Then changes spaces for unserscores
Finally encode it ready for use
OP 没有明确描述 slug 的所有属性,但这就是我从意图中收集的内容。
我对完美、有效、压缩的 slug 的解释与这篇文章一致:https://wordpress.stackexchange.com/questions/149191/slug-formatting-acceptable-characters#:~:text=然而% 2C%20we%20可以%20总结%20the,或%20end%20与%20a%20连字符。
我发现之前发布的答案都没有一致地实现这一点(而且我什至没有扩展问题的范围以包括多字节字符)。
我建议使用以下一行代码,它不需要声明一次性变量:
在我的演示链接中未显示,这里是更好地处理多字节字符串的尝试,尽管它不能完全适应卡西米尔的答案那么多的场景。
我还准备了一个演示,突出显示我认为其他答案中的不准确之处。 (演示)
The OP is not explicitly describing all of the attributes of a slug, but this is what I am gathering from the intent.
My interpretation of a perfect, valid, condensed slug aligns with this post: https://wordpress.stackexchange.com/questions/149191/slug-formatting-acceptable-characters#:~:text=However%2C%20we%20can%20summarise%20the,or%20end%20with%20a%20hyphen.
I find none of the earlier posted answers to achieve this consistently (and I'm not even stretching the scope of the question to include multi-byte characters).
I recommend the following one-liner which doesn't bother declaring single-use variables:
Not shown in my demo link, here is an attempt to better handle multibyte strings, though it doesn't quite accommodate as many scenarios as Casimir's answer.
I have also prepared a demonstration which highlights what I consider to be inaccuracies in the other answers. (Demo)
这将在 Unix shell 中完成(我刚刚在 MacOS 上尝试过):
我从 多吃壳,少吃蛋
This will do it in a Unix shell (I just tried it on my MacOS):
I got the idea from a blog post on More Shell, Less Egg
尝试这种
用法:
将输出:
abcdef-g
源:https://stackoverflow.com/a/14114419 /2439715
Try This
Usage:
Will output:
abcdef-g
source : https://stackoverflow.com/a/14114419/2439715
使用 intl transliterator 是一个不错的选择,因为有了它你可以轻松处理具有一套规则的复杂案件。我添加了自定义规则来说明它如何灵活以及如何保留最大程度的有意义的信息。请随意删除它们并添加您自己的规则。
demo
不幸的是,PHP 手册关于 ICU 转换的内容完全是空的,但您可以找到有关它们的信息 此处。
Using intl transliterator is a good option because with it you can easily handle complicated cases with a single set of rules. I added custom rules to illustrate how it can be flexible and how you can keep a maximum of meaningful informations. Feel free to remove them and to add your own rules.
demo
Unfortunately, the PHP manual is totally empty about ICU transformations but you can find informations about them here.
之前的所有 asnwers 都处理 url,但万一有人需要清理登录字符串(例如)并将其保留为文本,请执行以下操作:
All previous asnwers deal with url, but in case some one will need to sanitize string for login (e.g.) and keep it as text, here is you go:
您应该使用 slugify 包,而不是重新发明轮子;)
https://github.com/cocur/slugify
You should use the slugify package and not reinvent the wheel ;)
https://github.com/cocur/slugify