python中unicode字符串到ascii字符串的近似转换
不知道这是否微不足道,但我需要将 unicode 字符串转换为 ascii 字符串,并且我不希望周围有所有这些转义字符。我的意思是,是否有可能“近似”转换为一些非常相似的 ascii 字符?
例如:Gavin O'Connor 转换为 Gavin O\x92Connor,但我真的希望它只是转换为 Gavin O'Connor。这可能吗?有人编写了一些实用程序来执行此操作,还是我必须手动替换所有字符?
非常感谢! 马可
don't know wether this is trivial or not, but I'd need to convert an unicode string to ascii string, and I wouldn't like to have all those escape chars around. I mean, is it possible to have an "approximate" conversion to some quite similar ascii character?
For example: Gavin O’Connor gets converted to Gavin O\x92Connor, but I'd really like it to be just converted to Gavin O'Connor. Is this possible? Did anyone write some util to do it, or do I have to manually replace all chars?
Thank you very much!
Marco
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用 Unidecode 包来音译字符串。
Use the Unidecode package to transliterate the string.
输出:
以下是描述规范化形式的文档:http://unicode.org/报告/tr15/
Output:
Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/
应该可以正常工作。
should work fine.
有一种技术可以去除字符中的重音符号,但其他字符需要直接替换。查看这篇文章:http://effbot.org/zone/unicode-convert.htm
There is a technique to strip accents from characters, but other characters need to be directly replaced. Check this article: http://effbot.org/zone/unicode-convert.htm
尝试简单的字符替换
PS:如果得到 错误
Try simple character replacement
PS: add
# -*- coding: utf-8 -*-
to the top of your.py
file if you get error