当前位置：文江博客话题详情

在Python中代表具有一个字符的多个值

发布于 2025-02-09 15:34:02 字数 85 浏览 1 评论 0原文

我有2个在0-31范围内的值。我希望能够以1个字符表示这两个值（例如，在基本64中解释了我用1个字符的含义），但仍然能够知道这两个值是什么，哪个值是第一个。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吹梦到西洲 2025-02-16 15:34:02

找到一个具有1024个连续编码点的不错的Unicode块，例如

def char_encode(a, b):
  return chr(0x4E00 + a * 32 + b)

def char_decode(c):
  return divmod(ord(c) - 0x4E00, 32)

print(char_encode(17, 3))
# => 倣

print(char_decode('倣'))
# => (17, 3)

正如您提到的Base64 ...这是不可能的。基本64中的每个字符仅允许6位数据，您需要10个来表示您的两个数字。

还要注意，尽管这只是一个字符，但它占用了两个或三个字节，具体取决于您使用的编码。正如其他人指出的那样，无法将10位数据填充到8位字节中。

说明：A * 32 + B只需将两个数字映射到[0，32）中的两个数字中[0，1024）中的一个数字。例如，0 * 32 + 0 = 0; 31 * 32 + 31 = 1023。 chr chr 找到了与此的unicode字符CodePoint，但是具有低编码点（例如0）的字符是不可打印的，并且将是一个糟糕的选择，因此结果已转移到一个不错的大型Unicode块的开头：0x4e00是19968的十六进制表示，并且是CJK统一意识形态块中第一个字符的编码点。使用示例值，17 * 32 + 3 = 547和19968 + 547 = 20515，或0x5023在十六进制中，这是这是这是字符仿。因此，chr（20515）=“仿”。

char_decode函数仅对所有这些操作进行反面：如果a * p + b = x，然后a，b = divmod（x，p）< /code>（请参阅 divmod divmod ）。如果c = chr（x），则x = ord（c） functions.html＃ord“ rel =“ nofollow noreferrer”> ord ）。而且我敢肯定，您知道，如果W + r = y，则r = y -w。因此，在示例中，ord（“仿”）= 20515; 20515-0x4e00 = 547; Divmod（547，32）是（17，3）。

Find a nice Unicode block that has 1024 contiguous codepoints, for example CJK Unified Ideographs, and map your 32*32 values onto them. In Python 3:

def char_encode(a, b):
  return chr(0x4E00 + a * 32 + b)

def char_decode(c):
  return divmod(ord(c) - 0x4E00, 32)

print(char_encode(17, 3))
# => 倣

print(char_decode('倣'))
# => (17, 3)

As you mention Base64... this is impossible. Each character in a Base64 encoding only allows for 6 bits of data, and you need 10 to represent your two numbers.

And also note that while this is only one character, it takes up two or three bytes, depending on the encoding you use. As noted by others, there is no way to stuff 10 bits of data into an 8-bit byte.

Explanation: a * 32 + b simply maps two numbers in range [0, 32) into a single number in range [0, 1024). For example, 0 * 32 + 0 = 0; 31 * 32 + 31 = 1023. chr finds the Unicode character with that codepoint, but characters with low codepoints like 0 are not printable, and would be a poor choice, so the result is shifted to the beginning of a nice big Unicode block: 0x4E00 is a hexadecimal representation of 19968, and is the codepoint of the first character in the CJK Unified Ideographs block. Using the example values, 17 * 32 + 3 = 547 and 19968 + 547 = 20515, or 0x5023 in hexadecimal, which is the codepoint of the character 倣. Thus, chr(20515) = "倣".

The char_decode function just does all of these operations in reverse: if a * p + b = x, then a, b = divmod(x, p) (see divmod). If c = chr(x), then x = ord(c) (see ord). And I am sure you know that if w + r = y, then r = y - w. So in the example, ord("倣") = 20515; 20515 - 0x4E00 = 547; and divmod(547, 32) is (17, 3).