在Python中代表具有一个字符的多个值

发布于 2025-02-09 15:34:02 字数 85 浏览 1 评论 0原文

我有2个在0-31范围内的值。我希望能够以1个字符表示这两个值(例如,在基本64中解释了我用1个字符的含义),但仍然能够知道这两个值是什么,哪个值是第一个。

I have 2 values that are in the range 0-31. I want to be able to represent both of these values in 1 character (for example in base 64 to explain what I mean by 1 character) but still be able to know what both of the values are and which came first.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

吹梦到西洲 2025-02-16 15:34:02

找到一个具有1024个连续编码点的不错的Unicode块,例如

def char_encode(a, b):
  return chr(0x4E00 + a * 32 + b)

def char_decode(c):
  return divmod(ord(c) - 0x4E00, 32)

print(char_encode(17, 3))
# => 倣

print(char_decode('倣'))
# => (17, 3)

正如您提到的Base64 ...这是不可能的。基本64中的每个字符仅允许6位数据,您需要10个来表示您的两个数字。

还要注意,尽管这只是一个字符,但它占用了两个或三个字节,具体取决于您使用的编码。正如其他人指出的那样,无法将10位数据填充到8位字节中。


说明:A * 32 + B只需将两个数字映射到[0,32)中的两个数字中[0,1024)中的一个数字。例如,0 * 32 + 0 = 0; 31 * 32 + 31 = 1023chr chr 找到了与此的unicode字符CodePoint,但是具有低编码点(例如0)的字符是不可打印的,并且将是一个糟糕的选择,因此结果已转移到一个不错的大型Unicode块的开头:0x4e0019968的十六进制表示,并且是CJK统一意识形态块中第一个字符的编码点。使用示例值,17 * 32 + 3 = 54719968 + 547 = 20515,或0x5023在十六进制中,这是这是这是字符仿。因此,chr(20515)=“仿”

char_decode函数仅对所有这些操作进行反面:如果a * p + b = x,然后a,b = divmod(x,p)< /code>(请参阅 divmod divmod ) 。如果c = chr(x),则x = ord(c) functions.html#ord“ rel =“ nofollow noreferrer”> ord )。而且我敢肯定,您知道,如果W + r = y,则r = y -w。因此,在示例中,ord(“仿”)= 20515; 20515-0x4e00 = 547; Divmod(547,32)(17,3)

Find a nice Unicode block that has 1024 contiguous codepoints, for example CJK Unified Ideographs, and map your 32*32 values onto them. In Python 3:

def char_encode(a, b):
  return chr(0x4E00 + a * 32 + b)

def char_decode(c):
  return divmod(ord(c) - 0x4E00, 32)

print(char_encode(17, 3))
# => 倣

print(char_decode('倣'))
# => (17, 3)

As you mention Base64... this is impossible. Each character in a Base64 encoding only allows for 6 bits of data, and you need 10 to represent your two numbers.

And also note that while this is only one character, it takes up two or three bytes, depending on the encoding you use. As noted by others, there is no way to stuff 10 bits of data into an 8-bit byte.


Explanation: a * 32 + b simply maps two numbers in range [0, 32) into a single number in range [0, 1024). For example, 0 * 32 + 0 = 0; 31 * 32 + 31 = 1023. chr finds the Unicode character with that codepoint, but characters with low codepoints like 0 are not printable, and would be a poor choice, so the result is shifted to the beginning of a nice big Unicode block: 0x4E00 is a hexadecimal representation of 19968, and is the codepoint of the first character in the CJK Unified Ideographs block. Using the example values, 17 * 32 + 3 = 547 and 19968 + 547 = 20515, or 0x5023 in hexadecimal, which is the codepoint of the character . Thus, chr(20515) = "倣".

The char_decode function just does all of these operations in reverse: if a * p + b = x, then a, b = divmod(x, p) (see divmod). If c = chr(x), then x = ord(c) (see ord). And I am sure you know that if w + r = y, then r = y - w. So in the example, ord("倣") = 20515; 20515 - 0x4E00 = 547; and divmod(547, 32) is (17, 3).

—━☆沉默づ 2025-02-16 15:34:02

值[0,31]可以以5位存储,因为2 ** 5 == 32。因此,您可以明确地将两个这样的值存储在10位。相反,除非其他条件成立,否则您将无法明确地从少于10位中检索两个5位值。

如果您使用的是允许1024个或更多不同字符的编码,则可以将您的对映射到字符。否则,您根本不能。因此,ASCII不会在这里工作,Latin1也不是。但是,几乎所有“正常”的Unicode编码都很好。

请记住,对于UTF-8之类的东西,实际角色将占用10位以上。如果这是一个问题,请考虑使用UTF-16左右。

Values [0, 31] can be stored in 5 bits, since 2**5 == 32. You can therefore unambiguously store two such values in 10 bits. Conversely, you will not be able to unambiguously retrieve two 5-bit values from fewer than 10 bits unless some other conditions hold true.

If you are using an encoding that allows 1024 or more distinct characters, you can map your pairs to characters. Otherwise you simply can't. So ASCII is not going to work here, and neither is Latin1. But pretty much any of the "normal" Unicode encodings are fine.

Keep in mind that for something like UTF-8, the actual character will take up more than 10 bits. If that's a concern, consider using UTF-16 or so.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文