有没有办法将长度超过 25 个字符的字符串存储为长度小于 25 个字符的十六进制字符串,并使其可逆?
抱歉,如果标题没有意义。 基本上我有一系列 10-60 个字符长的字符串。 问题是我必须使用的服务仅接受最多 25 个字符的字符串,因此我需要一种方法将我必须的字符串转换为 25 个字符或更少,将其发送出去,当我收到结果时能够将其转换回原始 ID。
例子:
id = "this_is_a_test_account_that_is_longer_than_allowed"
id = contract(id)
// id = "DSFK23478JDSFHGW874"
id = expand("DSFK23478JDSFHGW874")
// id = "this_is_a_test_account_that_is_longer_than_allowed"
Sorry if the title doesn't make sense. Basically I have a series of strings that are 10-60 characters long. The problem being is the service I have to use only accepts strings up to 25 so I need a way to convert the strings I have to 25 characters or less, send it off and when I get the results back be able to convert it back to the original id.
Example:
id = "this_is_a_test_account_that_is_longer_than_allowed"
id = contract(id)
// id = "DSFK23478JDSFHGW874"
id = expand("DSFK23478JDSFHGW874")
// id = "this_is_a_test_account_that_is_longer_than_allowed"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
不,你不能这样做。 它基本上要求一种压缩算法,该算法总是使事情变得更小 - 但这不会发生。 至少在一般意义上不是这样,因为鸽子洞原则。 (特别是,考虑每个长度正确的十六进制字符串。您必须存储所有这些字符串,因此假设每个字符串都指向其自身。现在,您必须能够存储其他 字符串 - 但根据定义,你已经用完了有效的输出。)
另一方面,如果你有一个服务器可以为任何字符串生成 UUID 并存储该字符串,那么你可以稍后再次查找该 UUID 。 这对你的情况有用吗?
(当然,它不必是 UUID - 您可以从 0 开始,然后逐步向上...)
如果您事先知道所有字符串,那么这只是这种情况的一个特例:创建一个硬-所有字符串的编码双向映射,以某种方式(例如使用 UUID)生成唯一的输出。
No, you can't do this. It's basically asking for a compression algorithm which will always make things smaller - it's just not going to happen. At least not in a general sense, due to the pigeonhole principle. (In particular, think about every hex string of the right length. You've got to store all of those, so suppose each one just goes to itself. Now, you've got to be able to store other strings too - but by definition you've run out of valid outputs.)
On the other hand, if you have a server which could generate a UUID for any string and store the string, you could then look that UUID up again later. Would that work for your situation?
(Of course it doesn't have to be a UUID - you could just start with 0 and work your way up...)
If you know all the strings beforehand, that's just a special-case of this situation: create a hard-coded bi-directional map for all the strings, generating the output uniquely in some fashion (e.g. with UUIDs).
如果字符串的字符仅限于少数,则可以使用“禁止字符”对其进行一些压缩。 但我相信它不如将 60 个字符压缩成 25 个字符好......
If the characters of the string are restricted to only a few, it is possible to do some compression using the "forbidden characters" to compress it. But I believe it isn't as good as compressing 60 chars into 25 ones...
抱歉,您可能需要做一些奇特的事情或更改服务。
它可以很简单,只需将任意大的字符串存储到一个简单的表中,您可以在其中使用身份字段,该字段是您发送到服务以拉回完整字符串的内容。
Sorry, you are probably going to have to do something fancy or change the service.
It could be as simple as storing off your arbitrarily large strings to a simple table where you use an identity field which is what you send to the service to pull the full string back.
真的,这是对乔恩答案的补充。 在一般情况下(任何 10-60 个字符的字符串)这是不可能的。
然而,如果您的原始 ID 具有众所周知的特征 - 即您只使用字符 0 到 9 - 那么这是可能的。 但我们没有足够的信息来帮助您。
Really, an add on to Jon's answer. In the general case (any 10-60 character string) this is not possible.
HOWEVER, if your orginal IDs have well known characteristics - i.e. you only use characters 0 thru 9 - then it would be possible. But we don't have enough information to help you.
在一般情况下你不能这样做 - 总是使字符串更小将需要不可能的压缩。 但是,我可以看到两个选项:
首先,只需将密钥存储在映射中:
这需要一些共享存储,但可能工作正常。
如果您对字符串有所了解(例如您只使用过 A-Za-z0-9_ 那么您可以使用查找表来减小大小。这意味着每个字符只需要 6 位,而在 Java 中您有 16 位使用某种基于频率的霍夫曼编码会更好,但不能保证。
You can't do it in the general case - always make strings smaller would require impossible compression. However, I can see two options:
Firstly, just store the key in a map:
This requires some shared storage but might work fine.
If you know something about your strings (like for example you only ever use A-Za-z0-9_ then you could use a lookup table to reduce the size. This would mean each character only needs 6 bits, whereas in Java you have 16 bits per character. Using some sort of Huffman encoding based on frequency would work even better, but wouldn't be guaranteed.
看起来你的输入字符集是小写字母加下划线(27个字符)。 如果源输入中只有 16 个字符,则可以将两个字符放入一个字节中。
如果您采用两字节字符格式,则可以轻松做到这一点。 如果您要使用一字节字符格式,我认为您不能。
将您的字符串分成三个较小的字符串并使用该服务三次怎么样?
It looks like your input character set is lower case letters plus underscore (27 characters). If you had only 16 characters in your source input, you could put two into a byte.
If you're contracting to a two-byte character format, you can easily do this. If you're going to a one-byte character format, I think you can't.
How about breaking your strings up into three smaller strings and using the service three times?
这在很大程度上取决于这些字符串的内容。 例如,如果您知道输入字符串始终仅由字母 az (26)、AZ (26) 和数字 0-9 (10) 组成,那么您可以确信每个字符长字符串的 是 62 种可能的事物之一,可以轻松地用更少的位(在本例中为 6 位)来存储。 假设您使用的服务对一个字符使用 8 位,那么长度就会减少 25%。 如果输入字符串使用较少的字符,或者服务接受每个字符超过八位,您可能能够改进足够的东西来应付。
This would depend greatly on the content of those strings. For example, if you knew that the input strings were always composed of only the letters a-z (26), A-Z (26) and numbers 0-9 (10), then you could be assured that each character of the long string is one of 62 possible things, which could easily be stored with fewer bits (six, in this case). Assuming that the service you are using uses eight bits for a character, that gets you a 25% reduction in length. If the input strings use less characters, or the service accepts more than eight bits per character, you may be able to improve things enough to get by.
如果转换后的字符串仅用于暂时使用,那么假设发送请求并获取响应,那么您可以使用某些函数来获取“暂时唯一”最多 25 个字符的字符串,并将映射存储到原始 id。 使用瞬态 ID 后,您可以将其丢弃。 对于每个请求,您可以根据需要创建新的请求。 您只需确保在使用这些 id 的范围内不会获得重复的映射。 (类似于 Nick Fortescues 第一个例子。)
If the converted string is just for transient use, so let's say send a request and get a response back, then you can use some function to get a "transient unique" max 25 char string and store a mapping to your original id. After using the transient id, you could discard it. For each request you could create new ones, as needed. You just have to make sure you don't get duplicate mappings in the scope you are using those ids. (Similar to Nick Fortescues first example.)
压缩不会让你从你所在的地方到达你需要去的地方。 我认为三种方法可以解决您的问题......具体取决于帐户和服务的详细信息。
1) 为帐户分配一个不超过 25 个字符限制的备用 ID。 将现有 ID 视为“描述”,而不是服务的密钥。 这要求您可以生成某种哈希值并将其可靠地存储在服务外部,或者该服务还将存储 10 到 60 个字符之间的“描述”。
2) 使用该服务将 ID 分成三部分,并将每个部分存储在单独的 20 个字符的 ID 中。 使用剩余的 5 个字符为每个部分分配某种唯一的签名...允许您检索所有三个部分并重新组装 ID。 根据服务的不同,这可能是不受欢迎的(例如,它可能会为单个实例创建三个完整记录)。
3) 更改服务或找到允许 ID 最多 60 个字符的新服务。
Compression will not get you from were you are to where you need to be. There are three approaches that I think may solve your problem...depending on the details of the account and the service.
1) Assign an alternate ID to the account that will fit into the 25 character limit. Treat the existing ID as a 'description' rather than the key for the service. This requires that you can generate some kind of hash and reliably store that outside the service, or that the service will also store a 'description' that is between 10 and 60 characters.
2) Break the ID into three pieces and store each at a separate 20 character ID using the service. Use the remaining 5 characters to assign some kind of unique signature to each part...allowing you to retrieve all three pieces and reassemble the ID. Depending on the service, this may be undesirable (e.g. it may create three full records for a single instance).
3) Alter the service or find a new service that will allow ID's up to 60 characters.