当前位置：文江博客话题详情

仅适用于数字的压缩算法

发布于 2024-07-19 11:00:50 字数 207 浏览 4 评论 0原文

我要压缩位置数据（纬度，经度，日期，时间）。所有数字均采用固定格式。其中 2 个（纬度、经度）采用十进制格式。另外2个是整数。

现在这些数字是固定格式的字符串。

以固定格式压缩数字的算法有哪些？仅数字压缩（如果有的话）比字符串压缩更好吗？我应该直接压缩字符串而不将其转换为数字然后再压缩吗？

提前致谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三生池水覆流年 2024-07-26 11:00:50

这是需要一点理论帮助的地方之一。您需要考虑几件事：

测量的分辨率是多少：0.1° 还是 0.001°？ 1秒还是1微秒？
测量结果是相关联的并且按某种顺序排列，还是随机地组合在一起？

举例来说，分辨率为 0.01°。您知道您的值范围从 -180° 到 +180°，或 35900 个不同的值。 Lg(35900) ≈ 16 所以你需要 16 位； -90°–+90° 为 14 位。显然，如果您将这种值存储为浮点型，则可以立即将数据压缩一半。

与日期时间类似，范围是多少；你必须有多少位？

现在，如果数据按某种顺序排列（例如，在单艘船上顺序采集的样本），那么您需要的只是一个起始值和一个增量；这可以产生很大的变化。当船舶以 30 节的速度行驶时，位置变化不能超过每小时约 0.03 度或每秒约 0.0000083 度。这些增量将是非常小的值，因此您可以将它们存储在很少的位中。

关键是，您可以做很多事情，但您必须比我们更了解数据才能提出建议。

更新：哦，等等，定点字符串？！

好的，这（相对）容易。首先，是的，您想要将字符串转换为某种二进制表示形式。只是组成一个数据项，您可能可以

040.00105.0020090518212100Z

将其转换为

|  4000            | short int, 16 bits |  
| 10500            | short int, 16 bits |  
| 20090518212100Z  | 64 bits            |

96 位，12 字节与 26 字节。

This is one of these places where a little theory is helpful. You need to think about several things:

what is the resolution of your measurements: 0.1° or 0.001°? 1 second or one microsecond?
are the measurements associated and in some order, or tossed together randomly?

Let's say, just for example, that the resolution is 0.01°. Them you know that your values range from -180° to +180°, or 35900 different values. Lg(35900) ≈ 16 so you need 16 bits; 14 bits for -90°–+90°. Clearly, if you're storing this kind of value as floating-point, you can compress the data by half immediately.

Similarly with date time, what's the range; how many bits must you have?

Now, if the data is in some order (like, samples taken sequentially aboard a single ship) then all you need is a start value and a delta; that can make a big difference. With a ship traveling at 30 knots, the position can't change any more that about 0.03 degrees an hour or about 0.0000083 degrees a second. Those deltas are going to be very small values, so you can store them in a very few bits.

The point is that there are a number of things you can do, but you have to know more about the data than we do to make a recommendation.

Update: Oh, wait, fixed point strings?!

Okay, this is (relatively) easy. Just to start with, yes, you want to convert your strings into some binary representation. Just making up a data item, you might have

040.00105.0020090518212100Z

which you could convert to

|  4000            | short int, 16 bits |  
| 10500            | short int, 16 bits |  
| 20090518212100Z  | 64 bits            |

So that's 96 bits, 12 bytes versus 26 bytes.

回复收藏 0 原文

秋意浓 2024-07-26 11:00:50

压缩通常作用于字节流。当流的字节值（例如文本或存储为文本的数字）不均匀分布时，您可以实现的压缩比会更高，因为使用更少的位来存储更频繁出现的字节（在霍夫曼压缩）。

通常，您所讨论的数据将简单地存储为二进制数（而不是文本），这通常具有空间和检索效率。

我建议您查看数据压缩书籍

回复收藏 0 原文

尬尬 2024-07-26 11:00:50

您要压缩什么类型的数据？它是如何分布的？是否以任何方式订购？所有这些因素都会影响它的压缩效果，并且可能允许您将数据转换为更容易压缩的数据，或者直接转换为更小的数据。

数据压缩对于“随机”数据效果不佳。如果您的数据在较小范围内，您很可能能够利用它。

事实上，您应该简单地尝试运行任何常见算法并查看数据是否“压缩得足够”。如果不是，并且您对数据的了解超出了压缩算法“直觉”的范围，那么您应该利用该信息。

一个例子是，您的数据不仅仅是纬度和经度，而且还假设它们彼此“接近”。然后你可以存储“原点”纬度和经度，其余的可以是微分的。也许这些差异足够小，可以容纳在一个有符号的字节中。

这只是一个简单的例子，说明了你可以利用数据知识来完成一些事情，而不是一些通用算法可能无法弄清楚的事情。

回复收藏 0 原文

你列表最软的妹 2024-07-26 11:00:50

这取决于您要如何处理数据以及您需要多少精度。

纬度/经度传统上以度、分和秒为单位，其中 60 秒为分，60 分钟为度，1 度纬度名义上等于 60 海里 (nmi)。 1 分钟相当于 1 海里，1 秒相当于 100 英尺多一点。

纬度从 -90 度到 +90 度。将纬度表示为整数秒，范围为 -324000..+324000，或大约 20 位。经度从 -180 到 +180，因此以同样的方式表示经度还需要 1 位。

因此，您可以用 41 位表示完整的纬度/经度位置，精确到 +/- 50 英尺。

显然，如果您不需要那么高的精度，您可以减少位数。

请注意，传统的单精度 32 位浮点数使用大约 24 位尾数，因此如果您只是将纬度/经度（以秒为单位）转换为浮点数，则结果会下降到大约 +/- 6 英尺。对于这种事情，很难击败两个单精度浮点数。

回复收藏 0 原文

紅太極 2024-07-26 11:00:50

根据可用的角色，你可以很容易地制作一些东西。

例如，如果输入仅为数字 (0..9)，这里有一个在 Kotlin 中对它们进行编码和解码的解决方案（Java 上的类似方法）：

fun encodeDigitsOnlyString(stringWithDigitsOnly: String): ByteArray {
    //we couple each 2 digits together into a single byte.
    //For the last digit, if it has no digit to pair with, it's paired with something that's not a digit
    val result = ArrayList<Byte>()
    val length = stringWithDigitsOnly.length
    var lastDigit: Byte? = null
    for (i in 0 until length) {
        val char = stringWithDigitsOnly[i]
        val digitAsByte = char.toString().toInt().toByte()
        if (lastDigit == null) {
            if (i == length - 1) {
                //last digit
                val newByte = (digitAsByte + 0xf0).toByte()
                result.add(newByte)
            } else {
                //more to go
                lastDigit = digitAsByte
            }
        } else {
            val newByte = (digitAsByte + lastDigit.toInt().shl(4)).toByte()
            result.add(newByte)
            lastDigit = null
        }
    }
    return result.toByteArray()
}

fun decodeByteArrayToDigitsOnlyString(encodedDigitsOnlyByteArray: ByteArray): String {
    val sb = StringBuilder(encodedDigitsOnlyByteArray.size * 2)
    for (byte in encodedDigitsOnlyByteArray) {
        val hex = Integer.toHexString(byte.toInt()).takeLast(2).padStart(2, '0')
        if (hex[0].isLetter())
            sb.append(hex.last())
        else
            sb.append(hex)
    }
    return sb.toString()
}

示例用法：

val inputString="12345"
val byteArray=encodeDigitsOnlyString(inputString) //produces a byte array of size 3
val outputString=decodeByteArrayToDigitsOnlyString(byteArray) //should be the same as the input

Depending on the available characters, you could make something quite easily.

For example, if the input is only digits (0..9), here's a solution that will encode and decode them, in Kotlin (similar thing on Java) :

fun encodeDigitsOnlyString(stringWithDigitsOnly: String): ByteArray {
    //we couple each 2 digits together into a single byte.
    //For the last digit, if it has no digit to pair with, it's paired with something that's not a digit
    val result = ArrayList<Byte>()
    val length = stringWithDigitsOnly.length
    var lastDigit: Byte? = null
    for (i in 0 until length) {
        val char = stringWithDigitsOnly[i]
        val digitAsByte = char.toString().toInt().toByte()
        if (lastDigit == null) {
            if (i == length - 1) {
                //last digit
                val newByte = (digitAsByte + 0xf0).toByte()
                result.add(newByte)
            } else {
                //more to go
                lastDigit = digitAsByte
            }
        } else {
            val newByte = (digitAsByte + lastDigit.toInt().shl(4)).toByte()
            result.add(newByte)
            lastDigit = null
        }
    }
    return result.toByteArray()
}

fun decodeByteArrayToDigitsOnlyString(encodedDigitsOnlyByteArray: ByteArray): String {
    val sb = StringBuilder(encodedDigitsOnlyByteArray.size * 2)
    for (byte in encodedDigitsOnlyByteArray) {
        val hex = Integer.toHexString(byte.toInt()).takeLast(2).padStart(2, '0')
        if (hex[0].isLetter())
            sb.append(hex.last())
        else
            sb.append(hex)
    }
    return sb.toString()
}

Example usage:

val inputString="12345"
val byteArray=encodeDigitsOnlyString(inputString) //produces a byte array of size 3
val outputString=decodeByteArrayToDigitsOnlyString(byteArray) //should be the same as the input

回复收藏 0 原文

~没有更多了~