Python 中的字符串slugification

发布于 10-30 20:38 字数 564 浏览 5 评论 0原文

我正在寻找“slugify”字符串的最佳方法什么是“slug”，我当前的解决方案基于这个食谱

我已经将其稍微更改为：

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

有人看到这段代码有任何问题吗？它工作正常，但也许我错过了一些东西或者你知道更好的方法？

原文

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe

I have changed it a little bit to:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

追我者格杀勿论2024-11-06 20:38:04

有一个名为 python-slugify 的 python 包，它做了一个漂亮的slugify 做得很好：工作

pip install python-slugify

原理如下：

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

请参阅更多示例

这个包的功能比您要多一些发布（看看源代码，它只是一个文件）。该项目仍然活跃（在我最初回答的前 2 天更新，九年后（最后一次检查 2022-03-30），它仍然更新）。

小心：还有第二个包，名为slugify。如果您同时拥有它们，则可能会遇到问题，因为它们具有相同的导入名称。刚刚名为 slugify 的那个并没有完成我快速检查的所有操作：“Ich heiße” 变成了 “ich-heie” （应该是 < code>"ich-heisse")，因此在使用 pip 或 easy_install 时，请务必选择正确的。

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over nine years later (last checked 2022-03-30), it still gets updated).

careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.

回复收藏 0 原文

泅人2024-11-06 20:38:04

从此处安装 unicode 以获得 unicode 支持

pip 安装 unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
    text = unidecode.unidecode(text).lower()
    return re.sub(r'[\W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>>我的定制凯罗世界

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
    text = unidecode.unidecode(text).lower()
    return re.sub(r'[\W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

回复收藏 0 原文

冷…雨湿花2024-11-06 20:38:04

有一个名为 awesome-slugify 的 python 包：

pip install awesome-slugify

工作原理如下：

from slugify import slugify

slugify('one kožušček')  # one-kozuscek

awesome-slugify： com/dimka665/awesome-slugify">awesome-slugify github 页面

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček')  # one-kozuscek

awesome-slugify github page

回复收藏 0 原文

时间海2024-11-06 20:38:04

def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return mark_safe(re.sub('[-\s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

这是 django.utils.text 中存在的 slugify 函数
这应该可以满足您的要求。

def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return mark_safe(re.sub('[-\s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text
This should suffice your requirement.

回复收藏 0 原文

白鸥掠海2024-11-06 20:38:04

问题在于 ascii 标准化行：

slug = unicodedata.normalize('NFKD', s)

它被称为 unicode 标准化，它不会分解大量字符到 ascii。例如，它会从以下字符串中去除非 ascii 字符：

Mørdag -> mrdag
Æther -> ther

更好的方法是使用 unidecode 模块尝试将字符串音译为 ascii。因此，如果将上面的行替换为：

import unidecode
slug = unidecode.unidecode(s)

对于上面的字符串以及许多希腊语和俄语字符，您也会获得更好的结果：

Mørdag -> mordag
Æther -> aether

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether

回复收藏 0 原文

话少心凉2024-11-06 20:38:04

它在 Django 中运行良好，所以我不不明白为什么它不是一个好的通用 slugify 函数。

您有任何问题吗？

回复收藏 0 原文

梦巷2024-11-06 20:38:04

Unicode 很好；但是，请注意：unicode 是 GPL。如果此许可证不适合，请使用这个

回复收藏 0 原文

夜灵血窟げ2024-11-06 20:38:04

GitHub 上的几个选项：

https://github.com/dimka665/awesome-slugify
< a href="https://github.com/un33k/python-slugify" rel="nofollow">https://github.com/un33k/python-slugify
https://github.com/mozilla/unicode-slugify

每个 API 支持的参数略有不同，因此您需要仔细查看弄清楚你更喜欢什么。

特别要注意它们为处理非 ASCII 字符提供的不同选项。 Pydanny 写了一篇非常有用的博客文章，说明了这些 slugify'ing 库中的一些 unicode 处理差异： any-string.html" rel="nofollow">http://www.pydanny.com/awesome-slugify- human-read-url-slugs-from-any-string.html 这篇博文稍微已经过时了，因为 Mozilla 的 unicode-slugify 不再是 Django 特定的。

另请注意，目前 awesome-slugify 是 GPLv3，尽管存在一个未解决的问题，作者表示他们更愿意以 MIT/BSD 形式发布，只是不确定合法性：https://github.com/dimka665/awesome-slugify/issues/24

回复收藏 0 原文

哑2024-11-06 20:38:04

创建它的另一个好的答案可能是这种形式

import re
re.sub(r'\W+', '-', st).strip('-').lower()

Another good answer for creating it could be this form

import re
re.sub(r'\W+', '-', st).strip('-').lower()

回复收藏 0 原文

滥情空心2024-11-06 20:38:04

您可能会考虑将最后一行更改为，

slug=re.sub(r'--+',r'-',slug)

因为模式 [-]+ 与 -+ 没有什么不同，并且您并不真正关心只匹配一个连字符，只关心两个或更多。

但是，当然，这是很小的事情。

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

回复收藏 0 原文

淑女气质2024-11-06 20:38:04

另一个选项是 boltons.strutils.slugify。 Boltons 还有很多其他有用的功能，并且在 BSD 下分发许可证。

回复收藏 0 原文

难以启齿的温柔2024-11-06 20:38:04

根据您的示例，快速执行此操作的方法可能是：

s = 'String to slugify'

slug = s.replace(" ", "-").lower()

By your example, a fast manner to do that could be:

s = 'String to slugify'

slug = s.replace(" ", "-").lower()

回复收藏 0 原文

~没有更多了~

关于作者

蓝天白云

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

Python 中的字符串slugification

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（12）

关于作者

相关话题

热门标签

推荐作者

冰之心

貪欢

好菇凉咱不稀罕他

guowei007

大海や

1KUPGZrJCxEwZ

友情链接

Python 中的字符串slugification

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（12）

关于作者

相关话题

热门标签

推荐作者

冰之心

貪欢

好菇凉咱不稀罕他

guowei007

大海や

1KUPGZrJCxEwZ

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。