删除重音符号和特殊字符
我想删除重音符号,将所有字符转为小写,并删除任何字符数字和特殊字符。
示例:
Frédér8ic@ --> frederic
提案:
def remove_accents(data):
return ''.join(x for x in unicodedata.normalize('NFKD', data) if \
unicodedata.category(x)[0] == 'L').lower()
有没有更好的方法来做到这一点?
Possible Duplicate:
What is the best way to remove accents in a python unicode string?
Python and character normalization
I would like to remove accents, turn all characters to lowercase, and delete any numbers and special characters.
Example :
Frédér8ic@ --> frederic
Proposal:
def remove_accents(data):
return ''.join(x for x in unicodedata.normalize('NFKD', data) if \
unicodedata.category(x)[0] == 'L').lower()
Is there any better way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一个可能的解决方案是
使用 NFKD AFAIK 是规范化 unicode 将其转换为兼容字符的标准方法。剩下的就是删除源自规范化的特殊字符数字和 unicode 字符,您可以简单地与 string.ascii_letters 进行比较并删除不在该集合中的任何字符。
A possible solution would be
Using NFKD AFAIK is the standard way to normalize unicode to convert it to compatible characters. The rest as to remove the special characters numbers and unicode characters that originated from normalization, you can simply compare with
string.ascii_letters
and remove any character's not in that set.你能将字符串转换为 HTML 实体吗?如果是这样,您可以使用简单的正则表达式。
以下替换可以在 PHP/PCRE 中使用(请参阅我的其他答案作为示例):
然后只需从 HTML 实体转换回来并删除任何非
aZ
字符 (demo @ CodePad)。抱歉,我对 Python 的了解还不够,无法提供 Python 式的答案。
Can you convert the string into HTML entities? If so, you can then use a simple regular expression.
The following replacement would work in PHP/PCRE (see my other answer for an example):
Then simply convert back from HTML entities and remove any non
a-Z
char (demo @ CodePad).Sorry I don't know Python enough to provide a Pythonic answer.