以指定范围和精度压缩浮点数
在我的应用程序中,我将使用浮点值来存储地理坐标(纬度和经度)。
我知道这些值的整数部分分别在 [-90, 90]
和 [-180, 180]
范围内。另外,我还要求对这些值强制执行一些固定精度(目前为 0.00001
但可以稍后更改)。
在研究了单精度浮点类型(float
)之后,我发现它对于包含我的值来说有点小。这是因为 180 * 10^5
大于 2^24
(浮点数有效位数的大小),但小于 2^25
。
所以我必须使用双。但问题是我将存储大量的这个值,所以我不想浪费字节,存储不必要的精度。
那么,当我将 double 值(具有固定整数部分范围和指定精度 X)转换为 java 中的字节数组时,如何执行某种压缩?例如,如果我使用示例中的精度 (0.00001
),则每个值最终会得到 5 个字节。 我正在寻找一种轻量级的算法或解决方案,这样它就不会意味着巨大的计算。
In my application I'm going to use floating point values to store geographical coordinates (latitude and longitude).
I know that the integer part of these values will be in range [-90, 90]
and [-180, 180]
respectively. Also I have requirement to enforce some fixed precision on these values (for now it is 0.00001
but can be changed later).
After studying single precision floating point type (float
) I can see that it is just a little bit small to contain my values. That's because 180 * 10^5
is greater than 2^24
(size of the significand of float) but less than 2^25
.
So I have to use double. But the problem is that I'm going to store huge amounts of this values, so I don't want to waste bytes, storing unnecessary precision.
So how can I perform some sort of compression when converting my double value (with fixed integer part range and specified precision X) to byte array in java? So for example if I use precision from my example (0.00001
) I end up with 5 bytes for each value.
I'm looking for a lightweight algorithm or solution so that it doesn't imply huge calculations.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要将数字
x
存储为固定精度(例如)0.00001
,只需存储最接近100000 * x
的整数即可。 (顺便说一句,这需要 26 位,而不是 25 位,因为您还需要存储负数。)To store a number
x
to a fixed precision of (for instance)0.00001
, just store the integer closest to100000 * x
. (By the way, this requires 26 bits, not 25, because you need to store negative numbers too.)正如TonyK在他的回答中所说,使用
int
来存储数字。要进一步压缩数字,请使用局部性:地理坐标通常是“聚集的”(例如城市街区的轮廓)。使用固定参考点(完整的 2x26 位分辨率),然后将最后一个坐标的偏移量存储为字节(给您+/-0.00127)。或者,使用
short
为您提供超过一半的值范围。只需确保将压缩/解压缩隐藏在仅提供 double 作为外部 API 的类中,以便您可以随时调整精度和压缩算法。
As TonyK said in his answer, use an
int
to store the numbers.To compress the numbers further, use locality: Geo coordinates are often "clumped" (say the outline of a city block). Use a fixed reference point (full 2x26 bits resolution) and then store offsets to the last coordinate as
byte
s (gives you +/-0.00127). Alternatively, useshort
which gives you more than half the value range.Just be sure to hide the compression/decompression in a class which only offers
double
as outside API, so you can adjust the precision and the compression algorithm at any time.考虑到您的用例,我仍然会使用 double 并直接压缩它们。
原因是强大的压缩器,例如 7zip,非常擅长处理“结构化”数据,这使得一个 double 数组是(一个数据 = 8 个字节,这是非常规则且可预测的)。
您“手工”提出的任何其他优化都可能较差或提供的优势可以忽略不计,同时会花费您的时间和风险。
请注意,您仍然可以应用在压缩之前将 double 转换为 int 的“技巧”,但我真的不确定它是否会给您带来切实的好处,而另一方面,它会严重降低您应对不可预见的范围的能力未来的数字。
[编辑]根据源数据,如果“低于精度级别”的位是“有噪音的”,则压缩比可以通过舍入值或什至直接应用来消除噪音位。最低位上的掩码(我想最后一种方法不会让纯粹主义者满意,但至少您可以通过这种方式直接选择精度级别,同时保持可用的全部可能值范围)。
因此,总而言之,我建议对双精度数组进行直接 LZMA 压缩。
Considering your use case, i would nonetheless use double and compress them directly.
The reason is that strong compressors, such as 7zip, are extremely good at handling "structured" data, which an array of double is (one data = 8 bytes, this is very regular & predictable).
Any other optimisation you may come up "by hand" is likely to be inferior or offer negligible advantage, while simultaneously costing you time and risks.
Note that you can still apply the "trick" of converting the double into int before compression, but i'm really unsure if it would bring you tangible benefit, while on the other hand it would seriously reduce your ability to cope with unforeseen ranges of figures in the future.
[Edit] Depending on source data, if "lower than precision level" bits are "noisy", it can be usefull for compression ratio to remove the noisy bits, either by rounding the value or even directly applying a mask on lowest bits (i guess this last method will not please purists, but at least you can directly select your precision level this way, while keeping available the full range of possible values).
So, to summarize, i'd suggest direct LZMA compression on your array of double.