SPATIAL 几何索引性能是否取决于几何形状的大小和密度?
空间索引
给定一个空间索引,是索引效用,也就是说索引的整体性能,仅与整体几何形状一样好。
例如,如果我要采用一百万个几何数据类型并将它们插入到表中,以便它们的相对点彼此密集,这是否会使该索引对于相对位置可能明显更加稀疏的相同几何形状表现更好。
问题1
例如,以这两个几何形状为例。
情况 1
LINESTRING(0 0,1 1,2 2)
LINESTRING(1 1,2 2,3 3)
从几何角度看,它们是相同的,但它们的坐标有一点偏差。想象一下这被重复了一百万次。
现在采用这种情况,
情况 2
LINESTRING(0 0,1 1,2 2)
LINESTRING(1000000 1000000,1000001 10000001,1000002 1000002)
LINESTRING(2000000 2000000,2000001 20000001,2000002 2000002)
LINESTRING(3000000 3000000,3000001 30000001,3000002 3000002)
在上面的示例中:
- 线的尺寸与情况 1 相同,
- 线的点数相同,
- 线的尺寸相同。
然而,
- 区别在于,线条之间的距离非常远。
为什么这对我很重要?
我问这个问题的原因是因为我想知道是否应该尽可能地从输入几何图形中删除精度并降低它们的密度和彼此的接近度我的应用程序可以在不损失准确性的情况下提供尽可能多的信息。
问题 2
这个问题与第一个问题类似,但不是在空间上接近另一个几何形状,而是应该将形状本身简化为尽可能小的形状来描述应用程序所需的内容。
例如,如果我要在几何数据类型上使用 SPATIAL 索引来提供日期数据。 如果我想存储两个日期的日期范围,我可以在 mysql 中使用日期时间数据类型。但是,如果我想使用几何类型,以便通过获取每个单独的日期并将其转换为 unix_timestamp() 来传递日期范围,该怎么办?
例如:
Date("1st January 2011") to Timestamp = 1293861600
Date("31st January 2011") to Timestamp = 1296453600
现在,我可以根据这两个整数创建一个 LINESTRING。
LINESTRING(1293861600 0,1296453600 1)
如果我的应用程序实际上只关心天数,并且秒数对于日期范围根本不重要,那么我是否应该重构我的几何图形,以便将它们减小到尽可能小的大小以满足它们的需求。
因此,我将使用“1293861600”/(3600 * 24),而不是“1293861600”,它恰好是“14975.25”。
有人可以帮助填补这些空白吗?
Spatial Indexes
Given a spatial index, is the index utility, that is to say the overall performance of the index, only as good as the overall geometrys.
For example, if I were to take a million geometry data types and insert them into a table so that their relative points are densely located to one another, does this make this index perform better to identical geometry shapes whose relative location might be significantly more sparse.
Question 1
For example, take these two geometry shapes.
Situation 1
LINESTRING(0 0,1 1,2 2)
LINESTRING(1 1,2 2,3 3)
Geometrically they are identical, but their coordinates are off by a single point. Imagine this was repeated one million times.
Now take this situation,
Situation 2
LINESTRING(0 0,1 1,2 2)
LINESTRING(1000000 1000000,1000001 10000001,1000002 1000002)
LINESTRING(2000000 2000000,2000001 20000001,2000002 2000002)
LINESTRING(3000000 3000000,3000001 30000001,3000002 3000002)
In the above example:
- the lines dimensions are identical to the situation 1,
- the lines are of the same number of points
- the lines have identical sizes.
However,
- the difference is that the lines are massively futher apart.
Why is this important to me?
The reason I ask this question is because I want to know if I should remove as much precision from my input geometries as I possibly can and reduce their density and closeness to each other as much as my application can provide without losing accuracy.
Question 2
This question is similar to the first question, but instead of being spatially close to another geometry shape, should the shapes themselves be reduced to the smalest possible shape to describe what it is that the application requires.
For example, if I were to use a SPATIAL index on a geometry datatype to provide data on dates.
If I wanted to store a date range of two dates, I could use a datetime data type in mysql. However, what if I wanted to use a geometry type, so that I convery the date range by taking each individual date and converting it into a unix_timestamp().
For example:
Date("1st January 2011") to Timestamp = 1293861600
Date("31st January 2011") to Timestamp = 1296453600
Now, I could create a LINESTRING based on these two integers.
LINESTRING(1293861600 0,1296453600 1)
If my application is actually only concerned about days, and the number of seconds isn't important for date ranges at all, should I refactor my geometries so that they are reduced to their smallest possible size in order to fulfil what they need.
So that instead of "1293861600", I would use "1293861600" / (3600 * 24), which happens to be "14975.25".
Can someone help fill in these gaps?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
插入新条目时,引擎会选择最小扩展的
MBR
。通过“最小扩展”,引擎可以表示“区域扩展”或“周界扩展”,前者是
MySQL
中的默认值。这意味着只要您的节点具有非零面积,它们的绝对大小并不重要:较大的
MBR
保持较大,较小的保持较小,最终所有节点都将位于相同的MBR
您可能会对这些文章感兴趣:
至于密度,
MBR
在页面分割时重新计算,并且很有可能所有点都太远远离主集群的部分将在第一次拆分时移至其自己的MBR
。它会很大,但在几次迭代中就会成为所有突出点的父级。这将减少突出点的搜索时间,并增加一页搜索的聚类点的搜索时间。
When inserting a new entry, the engine chooses the
MBR
which would be minimally extended.By "minimally extended", the engine can mean either "area extension" or "perimeter extension", the former being default in
MySQL
.This means that as long as your nodes have non-zero area, their absolute sizes do not matter: the larger
MBR
's remain larger and the smaller ones remain smaller, and ultimately all nodes will end up in the sameMBR
sThese articles may be of interest to you:
As for the density, the
MBR
are recalculated on page splits, and there is a high chance that all points too far away from the main cluster will be moved away on the first split to their ownMBR
. It would be large but be a parent to all outstanding points in few iterations.This will decrease the search time for the outstanding points and will increase the search time for the cluster points by one page seek.