大数数组压缩
我有一个 JavaScript 应用程序,可以通过网络发送大量数字数据。然后该数据被存储在数据库中。我遇到大小问题(带宽太多,数据库太大)。我现在准备牺牲一些性能来进行压缩。
我正在考虑实现一个基数 62 number.toString(62) 和 parseInt(compressed, 62)。这肯定会减少数据的大小,但在我继续这样做之前,我想我会把它告诉这里的人们,因为我知道一定有一些我没有考虑过的开箱即用的解决方案。
基本规格是: - 将大量数组压缩为字符串以进行 JSONP 传输(所以我认为 UTF 已经过时了) - 相对较快,看起来我并不期望与现在相同的性能,但我也不想要 gzip 压缩。
任何想法将不胜感激。
谢谢吉
多·塔皮亚
I've got a javascript application that sends a large amount of numerical data down the wire. This data is then stored in a database. I am having size issues (too much bandwidth, database getting too big). I am now ready to sacrifice some performance for compression.
I was thinking of implementing a base 62 number.toString(62) and parseInt(compressed, 62). This would certainly reduce the size of the data but before I go ahead and do this I thought I would put it to the folks here as I know there must be some outside the box solution I have not considered.
The basic specs are:
- Compress large number arrays into strings for JSONP transfer (So I think UTF is out)
- Be relatively fast, look I'm not expecting same performance as I have now but I also don't want gzip compression either.
Any ideas would be greatly appreciated.
Thanks
Guido Tapia
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
另一种方法可能是编码为二进制类型,例如有符号/无符号整数,并手动解码,如 http://snippets.dzone.com/posts/show/685 这需要服务器端代码来创建二进制数据。
然后,您可以进行霍夫曼压缩或类似的 RLE 压缩(请参阅 http://rosettacode.org/wiki/Run -length_encoding#JavaScript 用于实现,尽管在不修改的情况下在 IE 中可能会出现一些问题)以进一步压缩数据。
编辑:
或者,您可以将数字本身转换为未编码 URI 字符范围中的基数(基数)(请参阅 http ://en.wikipedia.org/wiki/Percent-encoding)如果许多数字大于 2 位数字,它应该可以很好地工作。我在 http://code 转换了代码来自 python 的 .activestate.com/recipes/111286-numeric-base-converter-that-accepts-任意-digi/ 来执行此操作。
它目前不处理浮动,但可以相当容易地完成:
Another way of doing this might be to encode to binary types such as signed/unsigned ints, and manually decode as at http://snippets.dzone.com/posts/show/685 which would require server side code to create the binary data.
You could then huffman compression or something similar like RLE (see http://rosettacode.org/wiki/Run-length_encoding#JavaScript for an implementation, though it may have some issues in IE without modifying) to compress the data further.
EDIT:
Alternatively, you could convert the numbers themselves to a base (radix) in the unencoded URI character range (see http://en.wikipedia.org/wiki/Percent-encoding) which should work well if many of the numbers are larger than 2 digits. I converted the code at http://code.activestate.com/recipes/111286-numeric-base-converter-that-accepts-arbitrary-digi/ from python to do this.
It currently doesn't handle floats, but it could be done fairly easily:
选项
Options
我现在正在考虑将数字的长度编码为数字本身。我还没有完善这个算法,但一旦完成就会发布它。但大致这就是我目前正在努力实现的目标:
边界:
所以现在给出我知道的最大允许数量以 62 为基数的编码数字的最大长度为 2。因此任何编码数字的长度都是 1 或 2 个字符。惊人的。所以现在我将根据它是 1 个还是 2 个字符将数字设为奇数或偶数(记住我可以处理精度损失)。这消除了对分隔符的需要。
现在我看到大约 70%-80% 的压缩,目前它有很多错误,但我对此感到兴奋,所以这篇文章鼓励围绕这种方法进行讨论。
I'm now toying with the idea of encoding the length of the number into the number itself. I still have not perfected this algorithm but will post it once done. But roughly this is what I am currently trying to achieve:
Boundaries:
So now given my max allowed number I know that the length of the encoded digit in base 62 will have a max length of 2. So any encoded number is either 1 or 2 characters in length. Awesome. So now I'm going to make the number odd or even depending if its 1 or 2 characters (remember I can handle loss of precission). This removes the need for separators.
Now I'm seeing about 70%-80% compression with this, its very buggy at the moment but I'm excited about it, so the post to encourage discussion around this methodology.