给定 shift-jis 字符代码，获取 utf-8 字符代码？

发布于 2025-01-08 18:12:30 字数 513 浏览 0 评论 0 原文

在我的程序中，我得到了作为Python整数的shift-jis字符代码，我需要将其转换为相应的utf8字符代码（也应该是整数）。我怎样才能做到这一点？对于 ASCII，您可以使用有用的函数 ord()/chr()，它们允许您将整数转换为 ASCII 字符串，稍后您可以轻松地将其转换为 unicode。我找不到其他编码的类似内容。

使用Python 2。

编辑：最终代码。谢谢大家：

def shift_jis2unicode(charcode): # charcode is an integer
    if charcode <= 0xFF:
        string = chr(charcode)
    else:
        string = chr(charcode >> 8) + chr(charcode & 0xFF)

    return ord(string.decode('shift-jis'))

print shift_jis2unicode(8140)

原文

In my program I get shift-jis character codes as Python integers which I need to convert to their corresponding utf8 character codes (which should also be in integers).
How can I do that?
For ASCII you have the helpful functions ord()/chr() which allows you to convert an integer into an ASCII string which you can easily convert to unicode later. I can't find anything like that for other encodings.

Using Python 2.

EDIT: the final code. Thanks everyone:

def shift_jis2unicode(charcode): # charcode is an integer
    if charcode <= 0xFF:
        string = chr(charcode)
    else:
        string = chr(charcode >> 8) + chr(charcode & 0xFF)

    return ord(string.decode('shift-jis'))

print shift_jis2unicode(8140)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

末蓝 2025-01-15 18:12:30

不存在“utf8字符代码（也应该是整数）”这样的东西。

Unicode 定义了“代码点”，即整数。 UTF-8 定义了如何将这些代码点转换为字节数组。

所以我认为您需要 Unicode 代码点。在这种情况下：（

def shift_jis2unicode(charcode): # charcode is an integer
    if charcode <= 0xFF:
        shift_jis_string = chr(charcode)
    else:
        shift_jis_string = chr(charcode >> 8) + chr(charcode & 0xFF)

    unicode_string = shift_jis_string.decode('shift-jis')

    assert len(unicode_string) == 1
    return ord(unicode_string)

print "U+%04X" % shift_jis2unicode(0x8144)
print "U+%04X" % shift_jis2unicode(0x51)

另外：我不认为 8100 是有效的 shift-JIS 字符代码...）

There's no such thing as "utf8 character codes (which should also be in integers)".

Unicode defines "code points", which are integers. UTF-8 defines how to convert those code points to an array of bytes.

So I think you want the Unicode code points. In that case:

def shift_jis2unicode(charcode): # charcode is an integer
    if charcode <= 0xFF:
        shift_jis_string = chr(charcode)
    else:
        shift_jis_string = chr(charcode >> 8) + chr(charcode & 0xFF)

    unicode_string = shift_jis_string.decode('shift-jis')

    assert len(unicode_string) == 1
    return ord(unicode_string)

print "U+%04X" % shift_jis2unicode(0x8144)
print "U+%04X" % shift_jis2unicode(0x51)

(Also: I don't think 8100 is a valid shift-JIS character code...)

回复收藏 0 原文

耶耶耶 2025-01-15 18:12:30

可能有更好的方法来做到这一点，但由于还没有其他答案，这里有一个选择。

您可以使用此表将您的shift-jis整数转换为unicode 代码点，然后使用 unichr() 将数据转换为 Python unicode 对象，然后使用 unichr() 将其从 unicode 转换为 utf8。 python.org/tutorial/introduction.html#unicode-strings" rel="nofollow">unicode.encode('utf-8')。

回复收藏 0 原文

计㈡愣 2025-01-15 18:12:30

def from_shift_jis(seq):
    chars = [chr(c) if c <= 0xff else chr(c>>8) + chr(c&0xff) for c in seq]
    return ''.join(chars).decode('shift-jis')

utf8_output = [ord(c) for c in from_shift_jis(shift_jis_input).encode('utf-8')]

def from_shift_jis(seq):
    chars = [chr(c) if c <= 0xff else chr(c>>8) + chr(c&0xff) for c in seq]
    return ''.join(chars).decode('shift-jis')

utf8_output = [ord(c) for c in from_shift_jis(shift_jis_input).encode('utf-8')]

回复收藏 0 原文

~没有更多了~