如何最有效地存储纬度和经度数据?
这个问题来自于我布置的一份家庭作业。您可以将存储系统基于以下三种格式之一:
DD MM SS.S
DD MM.MMM
DD.DDDDD
您希望通过使用尽可能少的字节来最大化可存储的数据量。
我的解决方案基于第一种格式。我使用 3 个字节表示纬度:8 位用于 DD(-90 到 90),6 位用于 MM(0-59),10 位用于 SS.S(0-59.9)。然后,我使用 25 位表示经度:9 位表示 DDD(-180 到 180),6 位表示 MM,10 位表示 SS.S。这个解决方案不太适合字节边界,但我认为下一个读数可以紧接着上一个读数存储,并且 8 个读数将仅使用 49 个字节。
我很好奇其他人可以想出什么方法。有没有更有效的方法来存储这些数据?作为注释,我考虑了基于偏移的存储,但问题没有表明读数之间的值可能会发生多大变化,因此我假设任何变化都是可能的。
This question comes from a homework assignment I was given. You can base your storage system off of one of the three following formats:
DD MM SS.S
DD MM.MMM
DD.DDDDD
You want to maximize the amount of data you can store by using as few bytes as possible.
My solution is based off the first format. I used 3 bytes for latitude: 8 bits for the DD (-90 to 90), 6 bits for the MM (0-59), and 10 bits for the SS.S (0-59.9). I then used 25 bits for the longitude: 9 bits for the DDD (-180 to 180), 6 bits for the MM, and 10 for the SS.S. This solution doesn't fit nicely on a byte border, but I figured the next reading can be stored immediately following the previous one, and 8 readings would use only 49 bytes.
I'm curious what methods others can come up. Is there a more efficient method to storing this data? As a note, I considered an offset based storage, but the problem gave no indication of how much the values may change between readings, so I'm assuming any change is possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您建议的方法不是最佳方法。您使用 10 位(1024 个可能的值)来存储 (0..599) 范围内的值。这是浪费空间。
如果您将使用 3 个字节表示纬度,则应将范围 [0, 2^24-1] 映射到范围 [-90, 90]。因此,每个 2^24 值代表 180/2^24 度,即 0.086 秒。
如果您只需要 0.1 秒的精度,则需要 23 位的纬度和 24 位的经度(您将获得 0.077 秒的精度)。总共 47 位,而不是 49 位,并且精度更高。
我们能做得更好吗?
0.1 秒精度所需的确切位数是 log2(180*60*60*10 * 360*60*60*10) < 46.256。这意味着您可以使用 46256 位(5782 字节)来存储 1000 个(纬度、经度)对,但所涉及的数学需要处理非常大的整数。
我们能做得更好吗?
这要看情况。如果您的数据集具有浓度,则可以使用较少的位数仅存储一些点以及距这些点的相对距离。应使用聚类算法。
Your suggested method is not optimal. You are using 10 bits (1024 possible values) to store a value in the range (0..599). This is a waste of space.
If you'll use 3 bytes for latitude, you should map the range [0, 2^24-1] to the range [-90, 90]. Hence each of the 2^24 values represents 180/2^24 degrees, which is 0.086 seconds.
If you want only 0.1 second accuracy, you'll need 23 bits for latitudes and 24 bits for longitudes (you'll get 0.077 seconds accuracy). That's 47 bit total instead of your 49 bits, with better accuracy.
Can we do even better?
The exact number of bits needed for 0.1 second accuracy is log2(180*60*60*10 * 360*60*60*10) < 46.256. Which means that you can use 46256 bits (5782 bytes) to store 1000 (lat,lon) pairs, but the mathematics involved will require dealing with very large integers.
Can we do even better?
It depends. If your data set has concentrations, you can store only some points and relative distances from these points, using less bits. Clustering algorithms should be used.
坚持现有技术:
如果您使用半精度 浮点数 仅存储 DD.DDDDD 数据,您可以更节省空间,但您必须接受指数偏差为 15,这意味着:存储的坐标可能不准确,但存在一定的偏移量原始值。
这是由于浮点数的存储方式造成的,本质上是:标准化的有效乘以指数得到一个数字,而不是仅仅存储单个值(就像在整数中一样,您计算解决方案的数字的方式)。
下一个最常用的浮点数机制使用 32 位(许多编程语言中的“float”类型)——仍然高效,但比您的自定义格式更大。
但是,如果您也设计自己的自定义浮点类型,并且逐渐添加更多位,那么您的结果将变得更加精确,并且仍然比您最初找到的解决方案更有效。只需尝试一下用于有效值和指数的位数,并找出您的 fp 近似值与所需度数结果的接近程度!
Sticking to existing technology:
If you used half precision floating point numbers to store only the DD.DDDDD data, you can be a lot more space-efficent, but you'd have to accept an exponent bias of 15, which means: The coordinates stored might not be exact, but at an offset from the original value.
This is due to the way floating point numbers are stored, essentially: A normalized significant is multiplied by an exponent to result in a number, instead of just storing a single value (as in integer numbers, the way you calculated the numbers for your solution).
The next highest commonly used floating point number mechanism uses 32 bits (the type "float" in many programming languages) - still efficient, but larger than your custom format.
If, however, you would design your own custom floating point type as well, and you gradually added more bits, your results would become more exact and it would STILL be more efficient than the solution you first found. Just play around with the number of bits used for significant and exponent, and find out how close your fp approximations come to the desired result in degrees!
好吧,如果这是为了大量读数,那么您可以尝试差异化方法。从绝对位置开始,然后开始保存增量更改,理想情况下,这应该需要更少的位,具体取决于更改的性质。这有效地压缩了流。但不知怎的,我不认为这就是这个作业的目的。
Well, if this is for a large number of readings, then you may try a differential approach. Start with an absolute location, and then start saving incremental changes, which should ideally require less bits, depending on the nature of the changes. This is effectively compressing the stream. But somehow I don't think that's what this homework is about.