如何实现无损 URL 缩短
首先,一些背景信息:
我正在尝试在我自己的服务器上实现 URL 缩短(如果重要的话,用 C 语言)。目的是避免长 URL,同时能够从短 URL 恢复上下文。
目前我有一个实现,可以在服务器上创建一个由特定 ID 标识的会话。这是可行的,但会消耗服务器上的内存(并且不是所希望的,因为它是资源有限的嵌入式服务器,并且设备的主要目的不是提供网页,而是执行其他很酷的事情)。
另一种选择是使用 cookie 或 HTML5 webstorage 在客户端存储会话信息。
但我正在寻找的是能够将缩短的 URL 参数存储在附加到 URL 的一个参数中,并能够从该参数重新构造原始参数。
第一个想法是使用 Base64 编码将所有参数放入一个,但这会产生一个甚至更大的网址。
目前,我正在考虑压缩 URL 参数(使用一些压缩算法,如 zip、bz2...),对压缩的二进制 blob 进行 Base64 编码并使用该信息作为上下文。当我获得参数时,我可以进行 Base64 解码,解压缩结果并获得原始 URL。
问题是:是否还有其他我忽略的可能性,可以用来将大量 URL 参数无损压缩为单个较小的参数?
更新:
在来自 home 的评论之后,我意识到我忽略了压缩本身会给压缩数据增加一些开销,使得压缩数据甚至比原始数据更大,因为例如 压缩 添加到内容中。
因此(正如 home 在他的评论中所述),我开始认为压缩整个 URL 参数列表只有在参数超过一定长度时才真正有用,因为否则,我最终可能会具有比以前更大的 URL。
First, a bit of context:
I'm trying to implement a URL shortening on my own server (in C, if that matters). The aim is to avoid long URLs while being able to restore a context from a shortened URL.
Currently I have a implementation that creates a session on the server, identified by a certain ID. This works, but consumes memory on the server (and is not desired since it's an embedded server with limited resources and the main purpose of the device isn't providing web pages but doing other cool stuff).
Another option would be to use cookies or HTML5 webstorage to store the session information in the client.
But what I'm searching for is the possibility to store the shortened URL parameters in one parameter that I attach to the URL and be able to re-construct the original parameters from that one.
First thought was to use a Base64-encoding to put all the parameters into one, but this produces an even larger URL.
Currently, I'm thinking of compressing the URL parameters (using some compression algorithm like zip, bz2, ...), do the Base64-encoding on that compressed binary blob and use that information as context. When I get the parameter, I could do a Base64-decoding, de-compress the result and have hands on the original URL.
The question is: is there any other possibility that I'm overlooking that I could use to lossless compress a large list of URL parameters into a single smaller one?
Update:
After the comments from home, I realized that I overlooked that compressing itself adds some overhead to the compressed data making the compressed data even larger than the original data because of the overhead that for example zipping adds to the content.
So (as home states in his comments), I'm starting to think that compressing the whole list of URL parameters is only really useful if the parameters are beyond a certain length because otherwise, I could end up having an even larger URL than before.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以随时进行自己的压缩。如果您只是应用一些 huffman 编码,结果总是会更小(但然后对其进行 base64 编码,它会增长一点,所以净效果可能不是最佳的)。
我在我第一次使用 lzjb 的嵌入式项目上使用自定义压缩策略(lempel ziv 派生,请点击源代码链接,非常严格的实现(来自开放的Solaris)),然后对压缩结果进行霍夫曼编码。
不过,lzjb 算法在非常短的输入上表现不佳(~16 字节,在这种情况下我将其保留为未压缩)。
You can always roll your own compression. If you simply apply some huffman coding, the result will always be smaller (but then base64 encoding it, it'll grow a bit, so the net effect may perhaps not be optimal).
I'm using a custom compression strategy on an embedded project I work with where I first use a lzjb (a lempel ziv derivate, follow link for source code, really tight implementation (from open solaris)) followed by huffman coding the compressed result.
The lzjb algorithm doesn't perform too well on very short inputs, though (~16 bytes, in which case I leave it uncompressed).