Python UUID 表示为特殊字符
在Python中创建UUID时,就像这样:
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')
如何将UUID映射到由大写字母AZ减去字符D、F、I、O、Q和U,加上数字,加上字符“组成的字符串” +”和“=”。即从整数或字符串到 32 个(相对 OCR 友好)字符集:
[ABCEGHJKLMNPRSTVWXYZ1234567890+=]
我将其称为 OCRf
集(对于 OCR 友好)。
我想要一个同构函数:
def uuid_to_ocr_friendly_chars(uid)
"""takes uid, an integer, and transposes it into a string made
of the the OCRf set
"""
...
我的第一个想法是经历将 uuid 更改为基数 32 的过程。例如,
OCRf = "ABCEGHJKLMNPRSTVWXYZ1234567890+="
def uuid_to_ocr_friendly_chars(uid):
ocfstr = ''
while uid > 1:
ocfstr += OCRf[uid % 32]
uid /= 32
return ocfstr
但是,我想知道此方法是否是进行此转换的最佳和最快方法- 或者是否有更简单、更快的方法(例如内置的、更智能的算法或只是更好的方法)。
我很感谢您的意见。谢谢。
When creating a UUID in Python, likeso:
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')
How could one map that UUID into a string made up of the capitalized alphabet A-Z minus the characters D, F, I, O, Q, and U, plus the numerical digits, plus the characters "+" and "=". i.e. the from an integer or string onto the set of 32 (relatively OCR friendly) characters:
[ABCEGHJKLMNPRSTVWXYZ1234567890+=]
I'll call this the OCRf
set (for OCR friendly).
I'd like to have an isomorphic function:
def uuid_to_ocr_friendly_chars(uid)
"""takes uid, an integer, and transposes it into a string made
of the the OCRf set
"""
...
My first thought is to go through the process of changing the uuid to base 32. e.g.
OCRf = "ABCEGHJKLMNPRSTVWXYZ1234567890+="
def uuid_to_ocr_friendly_chars(uid):
ocfstr = ''
while uid > 1:
ocfstr += OCRf[uid % 32]
uid /= 32
return ocfstr
However, I'd like to know if this method is the best and fastest way to go about this conversion - or if there's a simpler and faster method (e.g. a builtin, a smarter algorithm, or just a better method).
I'm grateful for your input. Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
将表示形式“压缩”18.75%(即从 32 个字符到 26 个字符)对您来说有多重要?因为,如果保存这么小的字节百分比并不是绝对重要的,像
uid.hex.upper().replace('D','Z')
这样的东西会按照你的要求做(不使用您提供了整个字母表,但唯一的成本就是缺少 18.75% 的“挤压”)。如果压缩最后一个字节至关重要,那么我会处理每个 20 位的子字符串——即 5 个十六进制字符,即时髦字母表中的 4 个字符。其中有 6 个(加上剩余的 8 位,您可以像上面那样使用 hex.upper().replace ,因为做任何更花哨的事情都没有任何好处)。您可以通过切片
.hex
轻松获取子字符串,并使用int(theslice, 16)
将每个子字符串转换为 int。然后,您基本上可以应用上面使用的相同算法 - 但算术都是在更小的数字上完成的,因此速度增益应该是重要的。另外,不要通过循环+=
来构建字符串 - 列出所有“数字”,并在末尾''.join
它们 - - 这也是性能改进。How important is it to you to "squeeze" the representation by 18.75%, i.e., from 32 to 26 characters? Because, if saving this small percentage of bytes isn't absolutely crucial, something like
uid.hex.upper().replace('D','Z')
will do what you ask (not using the whole alphabet you make available, but the only cost of this is missing that 18.75% "squeezing").If squeezing down every last byte is crucial, I'd work on substrings of 20 bits each -- that's 5 hex characters, 4 characters in your funky alphabet. There are 6 of those (plus 8 bits left over, for which you can take the
hex.upper().replace
as above since there's nothing to gain in doing anything fancier). You can easily get the substrings by slicing.hex
and turn each into an int with anint(theslice, 16)
. Then, you can basically apply the same algorithm you're using above -- but the arithmetic is all done on much-smaller numbers, so the speed gain should be material. Also, don't build the string by looping on+=
-- make a list of all the "digits", and''.join
them all at the end -- that's also a performance improvement.再次转换回来
To convert back again
是的,这个方法确实让我有点不舒服,谢谢你的询问。
Yes, this method does make me a bit ill, thanks for asking.