需要压缩算法的想法
背景:我正在制作一个基于网络的头像生成系统,用户可以为他的头像选择部分(如身体、背景、眼睛、嘴巴、夹克、裤子等),然后图片是根据这些选择生成的。出于性能原因,我打算执行以下操作:从所选项目列表中生成一个包含其 ID 的文件名,并将图片保存在该文件名下。然后,当有图片请求时,网络服务器将直接提供服务。如果未找到图片,404 处理程序将生成它。那么问题来了:
问题:我想将整数列表压缩为尽可能短的字符串,仅包含 ASCII 字符(可用于文件名和 URL)。这些整数将是唯一的,并且大于 0(0 本身不会在其中)。我预计可能有大约 20 个,并且不会超过 200 个,但这只是一个猜测(尽管如果超过 500 个我会感到惊讶)。顺序并不重要。
你建议我应该做什么?
更新:哎呀,看来我犯了一个根本性的错误。我想避免将每个生成的头像存储在数据库中,而是将所有必要的信息存储在文件名中。我希望通过这种方式避免不必要的数据库使用,从而提高性能。然而今天我突然意识到,人们可能想要改变他们的化身,并期望他们在使用过的地方进行更新。因此,文件名必须是不变的。这样就只剩下一个选项 - 我必须将有关头像的信息保存在数据库中。然后我也可以使用 GUID 或其他一些随机字符串作为文件名。
感谢大家的帮助,对于误报深表歉意。 :(
Background: I'm making a web-based avatar generation system, where a user can select pieces for his avatar (like body, background, eyes, mouth, jacket, pants, etc.) and then a picture is generated from these selections. For performance reasons I then intend to do the following: from the list of selected items generate a filename which contains their IDs and save the picture under this filename. Then, when a request comes for a picture, the webserver will serve it directly. If a picture is not found, the 404-handler will generate it. And here then is the problem:
Question: I would like to compress a list of integers in as short a string as possible, consisting only of ASCII characters (usable for filenames and URLs). The integers will be unique, and greater than 0 (0 itself will not be among them). I expect that there might be around 20 of them and they would not exceed 200, but that is only a guess (though I'd be surprised if they got past 500). The order is not important.
What do you recommend I should do?
Update: Whoops, it seems I have made a fundamental mistake. I wanted to avoid storing every generated avatar in the DB, instead storing all necessary information in the filename. This way I hoped to avoid unnecessary use of DB and thus increased performance. However today it struck me that people might want to change their avatars, and expect them to update everywhere where they've been used. Thus, the filename has to be constant. This then leaves just one option - I'll have to save info about the avatar in the DB. And then I might as well use GUIDs or some other random strings for the filenames.
Thank you for your help, everyone, and sorry for the false alarm. :(
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一种选择可能是采用数字组合的 md5 来获取 128 位数字。然后,您可以将其十六进制编码为 32 个字符的 ASCII 字符串,或将其进行 base64 编码为 22 个字符的字符串。
您现在可以保证所有文件名都是固定大小,并且发生冲突的可能性微乎其微。
这为每个文件名节省了少量空间(128 位而不是 180 位),并且它与值的数量和每个值的范围无关,因此即使您超出 20 个项目或最大值 500,也不会影响文件名长度。
One option might be to take the md5 of the combination of numbers to get a 128-bit number. You can then hex encode that into a 32-character ASCII string, or base64 encode it into a 22-character one.
You can now guarantee that all file names are a fixed size and there is only a vanishingly small chance of a collision.
This saves a small amount of space for each filename (128 bits instead of 180) and it is independent of the number of values and range of each value, so even if you go beyond 20 items or a maximum value of 500 it won't affect the filename length.
不太清楚您在寻找什么; ASCII-85 可以工作吗? http://en.wikipedia.org/wiki/Ascii85
例如,每个整数都编码为UTF-8字符,将字符制成字符串,然后将得到的字符串以base-85进行编码。
正如 Doug Currie 指出的那样,ASCII-85 并不是一个好的选择。任何不使用正斜杠的 base64 变体都是更可取的。
Not really clear on what you're looking for; would ASCII-85 work? http://en.wikipedia.org/wiki/Ascii85
As in, each integer encoded to a UTF-8 character, the characters made into a string, then encode the resulting string in base-85.
As Doug Currie points out, ASCII-85 is not a good choice. Any of the base64 variants that don't use the forward slash is preferable.
将整数(每项 9 位)连接成字节数组,然后将结果编码为 Base64。
Concatenate the integers (9 bits per item) into an array of bytes, and then encode the result in Base64.