Python 中的字符串slugification

发布于 2024-10-30 20:38:04 字数 564 浏览 1 评论 0原文

我正在寻找“slugify”字符串的最佳方法什么是“slug”,我当前的解决方案基于 这个食谱

我已经将其稍微更改为:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

有人看到这段代码有任何问题吗?它工作正常,但也许我错过了一些东西或者你知道更好的方法?

I am in search of the best way to "slugify" string what "slug" is, and my current solution is based on this recipe

I have changed it a little bit to:

s = 'String to slugify'

slug = unicodedata.normalize('NFKD', s)
slug = slug.encode('ascii', 'ignore').lower()
slug = re.sub(r'[^a-z0-9]+', '-', slug).strip('-')
slug = re.sub(r'[-]+', '-', slug)

Anyone see any problems with this code? It is working fine, but maybe I am missing something or you know a better way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

追我者格杀勿论 2024-11-06 20:38:04

有一个名为 python-slugify 的 python 包,它做了一个漂亮的slugify 做得很好:工作

pip install python-slugify

原理如下:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

请参阅 更多示例

这个包的功能比您要多一些发布(看看源代码,它只是一个文件)。该项目仍然活跃(在我最初回答的前 2 天更新,九年后(最后一次检查 2022-03-30),它仍然更新)。

小心:还有第二个包,名为slugify。如果您同时拥有它们,则可能会遇到问题,因为它们具有相同的导入名称。刚刚名为 slugify 的那个并没有完成我快速检查的所有操作:“Ich heiße” 变成了 “ich-heie” (应该是 < code>"ich-heisse"),因此在使用 pipeasy_install 时,请务必选择正确的。

There is a python package named python-slugify, which does a pretty good job of slugifying:

pip install python-slugify

Works like this:

from slugify import slugify

txt = "This is a test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = "This -- is a ## test ---"
r = slugify(txt)
self.assertEquals(r, "this-is-a-test")

txt = 'C\'est déjà l\'été.'
r = slugify(txt)
self.assertEquals(r, "cest-deja-lete")

txt = 'Nín hǎo. Wǒ shì zhōng guó rén'
r = slugify(txt)
self.assertEquals(r, "nin-hao-wo-shi-zhong-guo-ren")

txt = 'Компьютер'
r = slugify(txt)
self.assertEquals(r, "kompiuter")

txt = 'jaja---lol-méméméoo--a'
r = slugify(txt)
self.assertEquals(r, "jaja-lol-mememeoo-a")

See More examples

This package does a bit more than what you posted (take a look at the source, it's just one file). The project is still active (got updated 2 days before I originally answered, over nine years later (last checked 2022-03-30), it still gets updated).

careful: There is a second package around, named slugify. If you have both of them, you might get a problem, as they have the same name for import. The one just named slugify didn't do all I quick-checked: "Ich heiße" became "ich-heie" (should be "ich-heisse"), so be sure to pick the right one, when using pip or easy_install.

泅人 2024-11-06 20:38:04

从此处安装 unicode 以获得 unicode 支持

pip 安装 unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
    text = unidecode.unidecode(text).lower()
    return re.sub(r'[\W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>>我的定制凯罗世界

Install unidecode form from here for unicode support

pip install unidecode

# -*- coding: utf-8 -*-
import re
import unidecode

def slugify(text):
    text = unidecode.unidecode(text).lower()
    return re.sub(r'[\W_]+', '-', text)

text = u"My custom хелло ворлд"
print slugify(text)

>>> my-custom-khello-vorld

冷…雨湿花 2024-11-06 20:38:04

有一个名为 awesome-slugify 的 python 包:

pip install awesome-slugify

工作原理如下:

from slugify import slugify

slugify('one kožušček')  # one-kozuscek

awesome-slugify: com/dimka665/awesome-slugify">awesome-slugify github 页面

There is python package named awesome-slugify:

pip install awesome-slugify

Works like this:

from slugify import slugify

slugify('one kožušček')  # one-kozuscek

awesome-slugify github page

时间海 2024-11-06 20:38:04
def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return mark_safe(re.sub('[-\s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

这是 django.utils.text 中存在的 slugify 函数
这应该可以满足您的要求。

def slugify(value):
    """
    Converts to lowercase, removes non-word characters (alphanumerics and
    underscores) and converts spaces to hyphens. Also strips leading and
    trailing whitespace.
    """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode('ascii')
    value = re.sub('[^\w\s-]', '', value).strip().lower()
    return mark_safe(re.sub('[-\s]+', '-', value))
slugify = allow_lazy(slugify, six.text_type)

This is the slugify function present in django.utils.text
This should suffice your requirement.

白鸥掠海 2024-11-06 20:38:04

问题在于 ascii 标准化行:

slug = unicodedata.normalize('NFKD', s)

它被称为 unicode 标准化,它不会分解大量字符到 ascii。例如,它会从以下字符串中去除非 ascii 字符:

Mørdag -> mrdag
Æther -> ther

更好的方法是使用 unidecode 模块尝试将字符串音译为 ascii。因此,如果将上面的行替换为:

import unidecode
slug = unidecode.unidecode(s)

对于上面的字符串以及许多希腊语和俄语字符,您也会获得更好的结果:

Mørdag -> mordag
Æther -> aether

The problem is with the ascii normalization line:

slug = unicodedata.normalize('NFKD', s)

It is called unicode normalization which does not decompose lots of characters to ascii. For example, it would strip non-ascii characters from the following strings:

Mørdag -> mrdag
Æther -> ther

A better way to do it is to use the unidecode module that tries to transliterate strings to ascii. So if you replace the above line with:

import unidecode
slug = unidecode.unidecode(s)

You get better results for the above strings and for many Greek and Russian characters too:

Mørdag -> mordag
Æther -> aether
话少心凉 2024-11-06 20:38:04

它在 Django 中运行良好,所以我不不明白为什么它不是一个好的通用 slugify 函数。

您有任何问题吗?

It works well in Django, so I don't see why it wouldn't be a good general purpose slugify function.

Are you having any problems with it?

梦巷 2024-11-06 20:38:04

Unicode 很好;但是,请注意:unicode 是 GPL。如果此许可证不适合,请使用这个

Unidecode is good; however, be careful: unidecode is GPL. If this license doesn't fit then use this one

夜灵血窟げ 2024-11-06 20:38:04

GitHub 上的几个选项:

  1. https://github.com/dimka665/awesome-slugify
  2. < a href="https://github.com/un33k/python-slugify" rel="nofollow">https://github.com/un33k/python-slugify
  3. https://github.com/mozilla/unicode-slugify

每个 API 支持的参数略有不同,因此您需要仔细查看弄清楚你更喜欢什么。

特别要注意它们为处理非 ASCII 字符提供的不同选项。 Pydanny 写了一篇非常有用的博客文章,说明了这些 slugify'ing 库中的一些 unicode 处理差异: any-string.html" rel="nofollow">http://www.pydanny.com/awesome-slugify- human-read-url-slugs-from-any-string.html 这篇博文稍微已经过时了,因为 Mozilla 的 unicode-slugify 不再是 Django 特定的。

另请注意,目前 awesome-slugify 是 GPLv3,尽管存在一个未解决的问题,作者表示他们更愿意以 MIT/BSD 形式发布,只是不确定合法性:https://github.com/dimka665/awesome-slugify/issues/24

A couple of options on GitHub:

  1. https://github.com/dimka665/awesome-slugify
  2. https://github.com/un33k/python-slugify
  3. https://github.com/mozilla/unicode-slugify

Each supports slightly different parameters for its API, so you'll need to look through to figure out what you prefer.

In particular, pay attention to the different options they provide for dealing with non-ASCII characters. Pydanny wrote a very helpful blog post illustrating some of the unicode handling differences in these slugify'ing libraries: http://www.pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html This blog post is slightly outdated because Mozilla's unicode-slugify is no longer Django-specific.

Also note that currently awesome-slugify is GPLv3, though there's an open issue where the author says they'd prefer to release as MIT/BSD, just not sure of the legality: https://github.com/dimka665/awesome-slugify/issues/24

2024-11-06 20:38:04

创建它的另一个好的答案可能是这种形式

import re
re.sub(r'\W+', '-', st).strip('-').lower()

Another good answer for creating it could be this form

import re
re.sub(r'\W+', '-', st).strip('-').lower()
滥情空心 2024-11-06 20:38:04

您可能会考虑将最后一行更改为,

slug=re.sub(r'--+',r'-',slug)

因为模式 [-]+-+ 没有什么不同,并且您并不真正关心只匹配一个连字符,只关心两个或更多。

但是,当然,这是很小的事情。

You might consider changing the last line to

slug=re.sub(r'--+',r'-',slug)

since the pattern [-]+ is no different than -+, and you don't really care about matching just one hyphen, only two or more.

But, of course, this is quite minor.

淑女气质 2024-11-06 20:38:04

另一个选项是 boltons.strutils.slugifyBoltons 还有很多其他有用的功能,并且在 BSD 下分发许可证。

Another option is boltons.strutils.slugify. Boltons has quite a few other useful functions as well, and is distributed under a BSD license.

难以启齿的温柔 2024-11-06 20:38:04

根据您的示例,快速执行此操作的方法可能是:

s = 'String to slugify'

slug = s.replace(" ", "-").lower()

By your example, a fast manner to do that could be:

s = 'String to slugify'

slug = s.replace(" ", "-").lower()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文