点数据的紧凑表示和传递
我有一个点数据数组,点的值表示为 x 坐标和 y 坐标。
这些点可能在 500 到 2000 点或更多的范围内。
数据表示运动路径,其范围可以从简单到非常复杂,并且其中也可以有尖点。
我可以将此数据表示为一个样条线或一组样条线或一些其他具有非常严格压缩的格式吗?
我曾尝试将它们表示为贝塞尔曲线的集合,但最多只能节省 40%。 例如,如果我有一个包含 500 个点的数组,则会给出 500 个 x 和 500 个 y 值,因此我有 1000 个数据块。 我从中得到了大约 100 个二次贝塞尔曲线。 每个贝塞尔曲线表示为controlx、controly、anchorx、anchory。 这给了我 100 x 4 = 400 条数据。 所以输入 = 1000 件,输出 = 400 件。
我想进一步加强这一点,有什么建议吗?
I have an array of point data, the values of points are represented as x co-ordinate and y co-ordinate.
These points could be in the range of 500 upto 2000 points or more.
The data represents a motion path which could range from the simple to very complex and can also have cusps in it.
Can I represent this data as one spline or a collection of splines or some other format with very tight compression.
I have tried representing them as a collection of beziers but at best I am getting a saving of 40 %.
For instance if I have an array of 500 points , that gives me 500 x and 500 y values so I have 1000 data pieces.
I around 100 quadratic beziers from this. each bezier is represented as controlx, controly, anchorx, anchory.
which gives me 100 x 4 = 400 pcs of data.
So input = 1000pcs , output = 400pcs.
I would like to further tighen this, any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
就其本质而言,样条曲线是一种近似。 您可以减少使用的样条线数量以达到更高的压缩比。
您还可以通过使用某种编码方案来实现无损压缩。 我只是在打字时使用之前答案中的范围示例(x 为 1000,y 为 400)来弥补这一点,
要正确解码序列,您需要一些前缀来标识编码类型。 假设我们使用 110 表示全点,10 表示位移,0 表示短位移。
位布局将如下所示,
除非您的序列完全随机,否则您可以使用此方案轻松实现高压缩比。
让我们用一个简短的例子来看看它是如何工作的。
3 个点:A(500, 400)、B(550, 380)、C(545, 381)
假设每个坐标使用 2 个字节。 在不压缩的情况下对其进行编码将需要 16 个字节。
要使用压缩方案对序列进行编码,
A 是第一个点,因此将使用完整坐标。 3字节。
B 相对于 A 的位移是 (50, -20),可以编码为位移。 2 字节。
C相对于B的位移为(-5, 1),符合短位移1字节的范围。
因此,您可以从 16 个字节中节省 10 个字节。 实际压缩率完全取决于数据模式。 它最适合形成移动路径的点。 如果点是随机的,则只能节省 25%。
By its nature, spline is an approximation. You can reduce the number of splines you use to reach a higher compression ratio.
You can also achieve lossless compression by using some kind of encoding scheme. I am just making this up as I am typing, using the range example in previous answer (1000 for x and 400 for y),
To decode the sequence properly, you would need some prefix to identify the type of encoding. Let's say we use 110 for full point, 10 for displacement and 0 for short displacement.
The bit layout will look like this,
Unless your sequence is totally random, you can easily achieve high compression ratio with this scheme.
Let's see how it works using a short example.
3 points: A(500, 400), B(550, 380), C(545, 381)
Let's say you were using 2 byte for each coordinate. It will take 16 bytes to encode this without compression.
To encode the sequence using the compression scheme,
A is first point so full coordinate will be used. 3 bytes.
B's displacement from A is (50, -20) and can be encoded as displacement. 2 bytes.
C's displacement from B is (-5, 1) and it fits the range of short displacement 1 byte.
So you save 10 bytes out of 16 bytes. Real compression ratio is totally depending on the data pattern. It works best on points forming a moving path. If the points are random, only 25% saving can be achieved.
例如,如果您使用 32 位整数作为点坐标,并且存在范围限制,
例如 x: 0..1000, y:0..400,您可以将 (x, y) 打包为单个 32 位变量。
这样您就可以实现另外 50% 的压缩。
If for example you use 32-bit integers for point coords and there is range limit,
like x: 0..1000, y:0..400, you can pack (x, y) into a single 32-bit variable.
That way you achieve another 50% compression.
您可以对尝试编码的数字进行频率分析,并使用不同的位长度来表示它们,当然这里我模糊地描述了 霍夫曼编码
You could do a frequency analysis of the numbers you are trying to encode and use varying bit lengths to represent them, of course here I am vaguely describing Huffman coding
首先,仅在数据中保留实际需要的足够小数点。 删除这些会降低您的准确性,但这是经过计算的损失。 为此,请尝试将数字转换为字符串,找到点的位置,然后从末尾剪切掉那么多字符。 在我看来,这可能比数学处理得更快。 最后您可以将其转换回数字。
其次,尝试存储与最后一个数字相关的数据(“相对值”)。 基本上是用这个数字减去最后一个数字。 然后,为了“解压缩”它,您可以保留一个累加器变量并将它们相加。
Firstly, only keep enough decimal points in your data that you actually need. Removing these would reduce your accuracy, but its a calculated loss. To do that, try converting your number to a string, locating the dot's position, and cutting of those many characters from the end. That could process faster than math, IMO. Lastly you can convert it back to a number.
Secondly, try storing your data relative to the last number ("relative values"). Basically subtract the last number from this one. Then later to "decompress" it you can keep an accumulator variable and add them up.