Python 中的字符串slugification
我正在寻找“slugify”字符串的最佳方法什么是“slug”,我当前的解决方案基于 这个食谱
我已经将其稍微更改为:
s = 'String to slugify'
slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)
有人看到这段代码有任何问题吗?它工作正常,但也许我错过了一些东西或者你知道更好的方法?
I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe
I have changed it a little bit to:
s = 'String to slugify'
slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)
Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
有一个名为
python-slugify
的 python 包,它做了一个漂亮的slugify 做得很好:工作原理如下:
请参阅 更多示例
这个包的功能比您要多一些发布(看看源代码,它只是一个文件)。该项目仍然活跃(在我最初回答的前 2 天更新,九年后(最后一次检查 2022-03-30),它仍然更新)。
小心:还有第二个包,名为
slugify
。如果您同时拥有它们,则可能会遇到问题,因为它们具有相同的导入名称。刚刚名为slugify
的那个并没有完成我快速检查的所有操作:“Ich heiße”
变成了“ich-heie”
(应该是 < code>"ich-heisse"),因此在使用pip
或easy_install
时,请务必选择正确的。There is a python package named
python-slugify
, which does a pretty good job of slugifying:Works like this:
See More examples
This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over nine years later (last checked 2022-03-30), it still gets updated).
careful: There is a second package around, named
slugify
. If you have both of them, you might get a problem, as they have the same name for import. The one just namedslugify
didn't do all I quick-checked:"Ich heiße"
became"ich-heie"
(should be"ich-heisse"
), so be sure to pick the right one, when usingpip
oreasy_install
.从此处安装 unicode 以获得 unicode 支持
Install unidecode form from here for unicode support
有一个名为 awesome-slugify 的 python 包:
工作原理如下:
awesome-slugify: com/dimka665/awesome-slugify">awesome-slugify github 页面
There is python package named awesome-slugify:
Works like this:
awesome-slugify github page
这是 django.utils.text 中存在的 slugify 函数
这应该可以满足您的要求。
This is the slugify function present in django.utils.text
This should suffice your requirement.
问题在于 ascii 标准化行:
它被称为 unicode 标准化,它不会分解大量字符到 ascii。例如,它会从以下字符串中去除非 ascii 字符:
更好的方法是使用 unidecode 模块尝试将字符串音译为 ascii。因此,如果将上面的行替换为:
对于上面的字符串以及许多希腊语和俄语字符,您也会获得更好的结果:
The problem is with the ascii normalization line:
It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:
A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:
You get better results for the above strings and for many Greek and Russian characters too:
它在 Django 中运行良好,所以我不不明白为什么它不是一个好的通用 slugify 函数。
您有任何问题吗?
It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.
Are you having any problems with it?
Unicode 很好;但是,请注意:unicode 是 GPL。如果此许可证不适合,请使用这个
Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one
GitHub 上的几个选项:
每个 API 支持的参数略有不同,因此您需要仔细查看弄清楚你更喜欢什么。
特别要注意它们为处理非 ASCII 字符提供的不同选项。 Pydanny 写了一篇非常有用的博客文章,说明了这些 slugify'ing 库中的一些 unicode 处理差异: any-string.html" rel="nofollow">http://www.pydanny.com/awesome-slugify- human-read-url-slugs-from-any-string.html 这篇博文稍微已经过时了,因为 Mozilla 的
unicode-slugify
不再是 Django 特定的。另请注意,目前
awesome-slugify
是 GPLv3,尽管存在一个未解决的问题,作者表示他们更愿意以 MIT/BSD 形式发布,只是不确定合法性:https://github.com/dimka665/awesome-slugify/issues/24A couple of options on GitHub:
Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.
In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's
unicode-slugify
is no longer Django-specific.Also note that currently
awesome-slugify
is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24创建它的另一个好的答案可能是这种形式
Another good answer for creating it could be this form
您可能会考虑将最后一行更改为,
因为模式
[-]+
与-+
没有什么不同,并且您并不真正关心只匹配一个连字符,只关心两个或更多。但是,当然,这是很小的事情。
You might consider changing the last line to
since the pattern
[-]+
is no different than-+
, and you don't really care about matching just one hyphen, only two or more.But, of course, this is quite minor.
另一个选项是
boltons.strutils.slugify
。 Boltons 还有很多其他有用的功能,并且在BSD 下分发许可证。
Another option is
boltons.strutils.slugify
. Boltons has quite a few other useful functions as well, and is distributed under aBSD
license.根据您的示例,快速执行此操作的方法可能是:
By your example, a fast manner to do that could be: