大数数组压缩

发布于 2024-08-28 05:49:07 字数 372 浏览 4 评论 0原文

我有一个 JavaScript 应用程序,可以通过网络发送大量数字数据。然后该数据被存储在数据库中。我遇到大小问题(带宽太多,数据库太大)。我现在准备牺牲一些性能来进行压缩。

我正在考虑实现一个基数 62 number.toString(62) 和 parseInt(compressed, 62)。这肯定会减少数据的大小,但在我继续这样做之前,我想我会把它告诉这里的人们,因为我知道一定有一些我没有考虑过的开箱即用的解决方案。

基本规格是: - 将大量数组压缩为字符串以进行 JSONP 传输(所以我认为 UTF 已经过时了) - 相对较快,看起来我并不期望与现在相同的性能,但我也不想要 gzip 压缩。

任何想法将不胜感激。

谢谢吉

多·塔皮亚

I've got a javascript application that sends a large amount of numerical data down the wire. This data is then stored in a database. I am having size issues (too much bandwidth, database getting too big). I am now ready to sacrifice some performance for compression.

I was thinking of implementing a base 62 number.toString(62) and parseInt(compressed, 62). This would certainly reduce the size of the data but before I go ahead and do this I thought I would put it to the folks here as I know there must be some outside the box solution I have not considered.

The basic specs are:
- Compress large number arrays into strings for JSONP transfer (So I think UTF is out)
- Be relatively fast, look I'm not expecting same performance as I have now but I also don't want gzip compression either.

Any ideas would be greatly appreciated.

Thanks

Guido Tapia

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

渔村楼浪 2024-09-04 05:49:07

另一种方法可能是编码为二进制类型,例如有符号/无符号整数,并手动解码,如 http://snippets.dzone.com/posts/show/685 这需要服务器端代码来创建二进制数据。

然后,您可以进行霍夫曼压缩或类似的 RLE 压缩(请参阅 http://rosettacode.org/wiki/Run -length_encoding#JavaScript 用于实现,尽管在不修改的情况下在 IE 中可能会出现一些问题)以进一步压缩数据。

编辑
或者,您可以将数字本身转换为未编码 URI 字符范围中的基数(基数)(请参阅 http ://en.wikipedia.org/wiki/Percent-encoding)如果许多数字大于 2 位数字,它应该可以很好地工作。我在 http://code 转换了代码来自 python 的 .activestate.com/recipes/111286-numeric-base-converter-that-accepts-任意-digi/ 来执行此操作。

它目前不处理浮动,但可以相当容易地完成:

function get_map(s) {
    d = {}
    for (var i=0; i<s.length; i++) {
        d[s.charAt(i)] = i}
    d.length = s.length
    d._s = s
    return d}

var separate_with = '~';
var encodable = get_map('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.'); // - is reserved for negatives obviously :-P
var base10 = get_map('0123456789')

// UNCOMMENT ME for length/speed testing in a wider base!
// You may wish to experiment with the ranges for a happy medium between bandwidth and DB space :-P
/*var encodable = ''
for (var i=1; i<128; i++) {
    encodable += String.fromCharCode(i)
}
encodable = get_map(encodable)*/

function baseconvert(number, fromdigits, todigits) {
    var number = String(number)

    if (number.charAt(0) == '-') {
        number = number.slice(1, number.length)
        neg=1}
    else {
        neg=0}

    // make an integer out of the number
    var x = 0
    for (var i=0; i<number.length; i++) {
        var digit = number.charAt(i)
        x = x*fromdigits.length + fromdigits[digit]
    }

    // create the result in base 'todigits.length'
    res = ""
    while (x>0) {
        remainder = x % todigits.length
        res = todigits._s.charAt(remainder) + res
        x = parseInt(x/todigits.length)
    }

    if (neg) res = "-"+res
    return res
}

function encodeNums(L) {
    var r = []
    for (var i=0; i<L.length; i++) {
         r.push(baseconvert(L[i], base10, encodable))
    }
    return r.join(separate_with)
}

function decodeNums(s) {
    var r = []
    var s = s.split(separate_with)
    for (var i=0; i<s.length; i++) {
         r.push(parseInt(baseconvert(s[i], encodable, base10)))
    }
    return r
}

var test = [5, 654645, 24324, 652124, 65, 65289543, 65278432, 643175874158, 652754327543]
alert(encodeNums(test))
alert(decodeNums(encodeNums(test)))

Another way of doing this might be to encode to binary types such as signed/unsigned ints, and manually decode as at http://snippets.dzone.com/posts/show/685 which would require server side code to create the binary data.

You could then huffman compression or something similar like RLE (see http://rosettacode.org/wiki/Run-length_encoding#JavaScript for an implementation, though it may have some issues in IE without modifying) to compress the data further.

EDIT:
Alternatively, you could convert the numbers themselves to a base (radix) in the unencoded URI character range (see http://en.wikipedia.org/wiki/Percent-encoding) which should work well if many of the numbers are larger than 2 digits. I converted the code at http://code.activestate.com/recipes/111286-numeric-base-converter-that-accepts-arbitrary-digi/ from python to do this.

It currently doesn't handle floats, but it could be done fairly easily:

function get_map(s) {
    d = {}
    for (var i=0; i<s.length; i++) {
        d[s.charAt(i)] = i}
    d.length = s.length
    d._s = s
    return d}

var separate_with = '~';
var encodable = get_map('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.'); // - is reserved for negatives obviously :-P
var base10 = get_map('0123456789')

// UNCOMMENT ME for length/speed testing in a wider base!
// You may wish to experiment with the ranges for a happy medium between bandwidth and DB space :-P
/*var encodable = ''
for (var i=1; i<128; i++) {
    encodable += String.fromCharCode(i)
}
encodable = get_map(encodable)*/

function baseconvert(number, fromdigits, todigits) {
    var number = String(number)

    if (number.charAt(0) == '-') {
        number = number.slice(1, number.length)
        neg=1}
    else {
        neg=0}

    // make an integer out of the number
    var x = 0
    for (var i=0; i<number.length; i++) {
        var digit = number.charAt(i)
        x = x*fromdigits.length + fromdigits[digit]
    }

    // create the result in base 'todigits.length'
    res = ""
    while (x>0) {
        remainder = x % todigits.length
        res = todigits._s.charAt(remainder) + res
        x = parseInt(x/todigits.length)
    }

    if (neg) res = "-"+res
    return res
}

function encodeNums(L) {
    var r = []
    for (var i=0; i<L.length; i++) {
         r.push(baseconvert(L[i], base10, encodable))
    }
    return r.join(separate_with)
}

function decodeNums(s) {
    var r = []
    var s = s.split(separate_with)
    for (var i=0; i<s.length; i++) {
         r.push(parseInt(baseconvert(s[i], encodable, base10)))
    }
    return r
}

var test = [5, 654645, 24324, 652124, 65, 65289543, 65278432, 643175874158, 652754327543]
alert(encodeNums(test))
alert(decodeNums(encodeNums(test)))
漫雪独思 2024-09-04 05:49:07

选项

  • 使用 js 库(请参阅 Josh 的回答),但注意脚本超时
  • 使用某种也执行 zip 功能的 activex 或 silverlight 上传程序
  • 使用 java 插件
  • 使用基于 flash 的压缩上传程序

Options

  • Use a js library (see answer from Josh) but watch for script timeouts
  • Use some kind of activex or silverlight uploader that also does zip
  • Use a java plug-in
  • Use a flash based compressing uploader
永言不败 2024-09-04 05:49:07

我现在正在考虑将数字的长度编码为数字本身。我还没有完善这个算法,但一旦完成就会发布它。但大致这就是我目前正在努力实现的目标:

边界:

  • 允许精度损失 (+- 3)
  • 最大数量为 3200
  • 最小为 0 (所以没有负数)
  • 无小数 - 浮点数

所以现在给出我知道的最大允许数量以 62 为基数的编码数字的最大长度为 2。因此任何编码数字的长度都是 1 或 2 个字符。惊人的。所以现在我将根据它是 1 个还是 2 个字符将数字设为奇数或偶数(记住我可以处理精度损失)。这消除了对分隔符的需要。

现在我看到大约 70%-80% 的压缩,目前它有很多错误,但我对此感到兴奋,所以这篇文章鼓励围绕这种方法进行讨论。

I'm now toying with the idea of encoding the length of the number into the number itself. I still have not perfected this algorithm but will post it once done. But roughly this is what I am currently trying to achieve:

Boundaries:

  • Loss of precision is allowed (+- 3)
  • Max number will be 3200
  • Min is 0 (so no negatives)
  • No decimal - floats

So now given my max allowed number I know that the length of the encoded digit in base 62 will have a max length of 2. So any encoded number is either 1 or 2 characters in length. Awesome. So now I'm going to make the number odd or even depending if its 1 or 2 characters (remember I can handle loss of precission). This removes the need for separators.

Now I'm seeing about 70%-80% compression with this, its very buggy at the moment but I'm excited about it, so the post to encourage discussion around this methodology.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文