Python:如何用半角字符替换全角字符?
如果这是 PHP,我可能会这样做:
function no_more_half_widths($string){
$foo = array('1','2','3','4','5','6','7','8','9','10')
$bar = array('1','2','3','4','5','6','7','8','9','10')
return str_replace($foo, $bar, $string)
}
我尝试过 python 中的 .translate 函数,它表明数组的大小不同。我认为这是因为单个字符是用 utf-8 编码的。有什么建议吗?
If this was PHP, I would probably do something like this:
function no_more_half_widths($string){
$foo = array('1','2','3','4','5','6','7','8','9','10')
$bar = array('1','2','3','4','5','6','7','8','9','10')
return str_replace($foo, $bar, $string)
}
I have tried the .translate function in python and it indicates that the arrays are not of the same size. I assume this is due to the fact that the individual characters are encoded in utf-8. Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
内置的
unicodedata
模块可以做到这一点:“NFKC”代表“标准化形式 KC [兼容性分解,然后是规范组合]”,并将全角字符替换为半角字符,即 Unicode 等效项。
请注意,它还同时规范化各种其他事物,例如单独的重音符号和罗马数字符号。
The built-in
unicodedata
module can do it:The “NFKC” stands for “Normalization Form KC [Compatibility Decomposition, followed by Canonical Composition]”, and replaces full-width characters by half-width ones, which are Unicode equivalent.
Note that it also normalizes all sorts of other things at the same time, like separate accent marks and Roman numeral symbols.
在Python3中,您可以使用以下代码片段。它在所有 ASCII 字符和相应的全角字符之间建立映射。最重要的是,这不需要您对很容易出错的 ascii 序列进行硬编码。
另外,使用相同的逻辑,您可以使用以下代码将半角字符转换为全角字符:
注意:这两个片段仅考虑 ASCII 字符,并且不转换任何日文/韩文全角字符。
为了完整起见,来自维基百科:
python2 解决方案可以在 gist/jcayzac 找到。
In Python3, you can use the following snippet. It makes a map between all ASCII characters and corresponding fullwidth characters. Best of all, this doesn't need you to hard code the ascii sequence, which is error prone.
Also, with same logic, you can convert halfwidth characters to fullwidth, with the following code:
Note: These two snippets only consider ASCII characters, and does not convert any japanese/korean fullwidth characters.
For completeness, from wikipedia:
A python2 solution can be found at gist/jcayzac.
我认为没有内置函数可以一次性进行多次替换,因此您必须自己完成。
一种方法是:
或者使用字典:
或者最后,使用正则表达式(这实际上可能是最快的):
I don't think there's a built-in function to do multiple replacements in one pass, so you'll have to do it yourself.
One way to do it:
Or using a dictionary:
Or finally, using regex (and this might actually be the fastest):
使用
unicode.translate
方法:它需要将代码点映射为数字,而不是字符。此外,使用
u'unicodeliters'
会使值保持未编码状态。Using the
unicode.translate
method:It requires a mapping of code points as numbers, not characters. Also, using
u'unicode literals'
leaves the values unencoded.在Python 3中,最干净的方法是使用 str.translate 和 < a href="https://docs.python.org/3/library/stdtypes.html#str.maketrans" rel="nofollow">str.maketrans:
在 Python 2 中,str.maketrans 是 < a href="https://docs.python.org/2/library/string.html#string.maketrans" rel="nofollow">string.maketrans 并且不适用于 Unicode 字符,因此您需要制作一本字典,正如乔什·李上面指出的那样。
In Python 3, cleanest is to use str.translate and str.maketrans:
In Python 2, str.maketrans is instead string.maketrans and doesn’t work with Unicode characters, so you need to make a dictionary, as Josh Lee notes above.
正则表达式方法
Regex approach