Python UUID 表示为特殊字符

发布于 2024-08-21 09:49:01 字数 873 浏览 6 评论 0原文

在Python中创建UUID时,就像这样:

>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

如何将UUID映射到由大写字母AZ减去字符D、F、I、O、Q和U,加上数字,加上字符“组成的字符串” +”和“=”。即从整数或字符串到 32 个(相对 OCR 友好)字符集:

[ABCEGHJKLMNPRSTVWXYZ1234567890+=]

我将其称为 OCRf 集(对于 OCR 友好)。

我想要一个同构函数:

def uuid_to_ocr_friendly_chars(uid)
    """takes uid, an integer, and transposes it into a string made 
       of the the OCRf set
    """
    ...

我的第一个想法是经历将 uuid 更改为基数 32 的过程。例如,

OCRf = "ABCEGHJKLMNPRSTVWXYZ1234567890+="

def uuid_to_ocr_friendly_chars(uid):
     ocfstr = ''
     while uid > 1:
        ocfstr += OCRf[uid % 32]
        uid /= 32
     return ocfstr

但是,我想知道此方法是否是进行此转换的最佳和最快方法- 或者是否有更简单、更快的方法(例如内置的、更智能的算法或只是更好的方法)。

我很感谢您的意见。谢谢。

When creating a UUID in Python, likeso:

>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

How could one map that UUID into a string made up of the capitalized alphabet A-Z minus the characters D, F, I, O, Q, and U, plus the numerical digits, plus the characters "+" and "=". i.e. the from an integer or string onto the set of 32 (relatively OCR friendly) characters:

[ABCEGHJKLMNPRSTVWXYZ1234567890+=]

I'll call this the OCRf set (for OCR friendly).

I'd like to have an isomorphic function:

def uuid_to_ocr_friendly_chars(uid)
    """takes uid, an integer, and transposes it into a string made 
       of the the OCRf set
    """
    ...

My first thought is to go through the process of changing the uuid to base 32. e.g.

OCRf = "ABCEGHJKLMNPRSTVWXYZ1234567890+="

def uuid_to_ocr_friendly_chars(uid):
     ocfstr = ''
     while uid > 1:
        ocfstr += OCRf[uid % 32]
        uid /= 32
     return ocfstr

However, I'd like to know if this method is the best and fastest way to go about this conversion - or if there's a simpler and faster method (e.g. a builtin, a smarter algorithm, or just a better method).

I'm grateful for your input. Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

将军与妓 2024-08-28 09:49:01

将表示形式“压缩”18.75%(即从 32 个字符到 26 个字符)对您来说有多重要?因为,如果保存这么小的字节百分比并不是绝对重要的,像 uid.hex.upper().replace('D','Z') 这样的东西会按照你的要求做(不使用您提供了整个字母表,但唯一的成本就是缺少 18.75% 的“挤压”)。

如果压缩最后一个字节至关重要,那么我会处理每个 20 位的子字符串——即 5 个十六进制字符,即时髦字母表中的 4 个字符。其中有 6 个(加上剩余的 8 位,您可以像上面那样使用 hex.upper().replace ,因为做任何更花哨的事情都没有任何好处)。您可以通过切片 .hex 轻松获取子字符串,并使用 int(theslice, 16) 将每个子字符串转换为 int。然后,您基本上可以应用上面使用的相同算法 - 但算术都是在更小的数字上完成的,因此速度增益应该是重要的。另外,不要通过循环 += 来构建字符串 - 列出所有“数字”,并在末尾 ''.join 它们 - - 这也是性能改进。

How important is it to you to "squeeze" the representation by 18.75%, i.e., from 32 to 26 characters? Because, if saving this small percentage of bytes isn't absolutely crucial, something like uid.hex.upper().replace('D','Z') will do what you ask (not using the whole alphabet you make available, but the only cost of this is missing that 18.75% "squeezing").

If squeezing down every last byte is crucial, I'd work on substrings of 20 bits each -- that's 5 hex characters, 4 characters in your funky alphabet. There are 6 of those (plus 8 bits left over, for which you can take the hex.upper().replace as above since there's nothing to gain in doing anything fancier). You can easily get the substrings by slicing .hex and turn each into an int with an int(theslice, 16). Then, you can basically apply the same algorithm you're using above -- but the arithmetic is all done on much-smaller numbers, so the speed gain should be material. Also, don't build the string by looping on += -- make a list of all the "digits", and ''.join them all at the end -- that's also a performance improvement.

爱本泡沫多脆弱 2024-08-28 09:49:01
>>> OCRf = 'ABCEGHJKLMNPRSTVWXYZ1234567890+='
>>> uuid = 'a8098c1a-f86e-11da-bd1a-00112444be1e'
>>> binstr = bin(int(uuid.replace("-",""),16))[2:].zfill(130)
>>> ocfstr = "".join(OCRf[int(binstr[i:i+5],2)] for i in range(0,130,5))
>>> ocfstr
'HLBJJB2+ETCKSP7JWACGYGMVW+'

再次转换回来

>>> "%x"%(int("".join(bin(OCRf.index(i))[2:].zfill(5) for i in ocfstr),2))
'a8098c1af86e11dabd1a00112444be1e'
>>> OCRf = 'ABCEGHJKLMNPRSTVWXYZ1234567890+='
>>> uuid = 'a8098c1a-f86e-11da-bd1a-00112444be1e'
>>> binstr = bin(int(uuid.replace("-",""),16))[2:].zfill(130)
>>> ocfstr = "".join(OCRf[int(binstr[i:i+5],2)] for i in range(0,130,5))
>>> ocfstr
'HLBJJB2+ETCKSP7JWACGYGMVW+'

To convert back again

>>> "%x"%(int("".join(bin(OCRf.index(i))[2:].zfill(5) for i in ocfstr),2))
'a8098c1af86e11dabd1a00112444be1e'
病女 2024-08-28 09:49:01
transtbl = string.maketrans(
  'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567',
  'ABCEGHJKLMNPRSTVWXYZ1234567890+='
)

uuidstr = uuid.uuid1()

print base64.b32encode(str(uuidstr).replace('-', '').decode('hex')).rstrip('=').translate(transtbl)

是的,这个方法确实让我有点不舒服,谢谢你的询问。

transtbl = string.maketrans(
  'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567',
  'ABCEGHJKLMNPRSTVWXYZ1234567890+='
)

uuidstr = uuid.uuid1()

print base64.b32encode(str(uuidstr).replace('-', '').decode('hex')).rstrip('=').translate(transtbl)

Yes, this method does make me a bit ill, thanks for asking.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文