性能问题...
我有一个包含地理位置数据(经度和纬度)的房屋数据库。
我想要做的是找到使用 InnoDB 数据库引擎在 MySQL (v5.0.24a) 中存储位置数据的最佳方法,这样我就可以执行大量查询,返回之间的所有主记录x1 和 x2 纬度
以及 y1 和 y2 经度
。
现在,我的数据库模式是
---------------------
Homes
---------------------
geolat - Float (10,6)
geolng - Float (10,6)
---------------------
,我的查询是:
SELECT ...
WHERE geolat BETWEEN x1 AND x2
AND geolng BETWEEN y1 AND y2
- 我上面描述的是存储数据的最佳方式吗?
MySQL中的纬度和经度数据使用Float(10,6)并分离出经度/纬度?如果不是,那是什么?存在 Float、Decimal 甚至 Spatial 作为数据类型。
- 这是执行此操作的最佳方式吗
从性能的角度来看SQL?如果不是,那是什么?
- 是否使用不同的 MySQL
数据库引擎有意义吗?
更新:仍未得到答复
我在下面有 3 个不同的答案。有人说使用Float
。有人说使用INT
。有人说要使用Spatial
。
所以我使用MySQL“EXPLAIN”语句来测量SQL执行速度。 INT
或 FLOAT
作为经度和纬度数据类型,SQL 执行(结果集获取)绝对不存在任何差异。
如果使用 BETWEEN
”语句比使用“>
”或“<
”SQL 语句要快得多。使用“BETWEEN
”比使用“>
”和“<
”语句快近 3 倍。
话虽如此,我仍然不确定使用 Spatial 会对性能产生什么影响,因为我不清楚我运行的 MySQL 版本 (v5.0.24) 是否支持它……以及如果支持的话我如何启用它。
任何帮助将不胜感激
Performance question ...
I have a database of houses that have geolocation data (longitude & latitude).
What I want to do is find the best way to store the locational data in my MySQL (v5.0.24a) using InnoDB database-engine so that I can perform a lot of queries where I'm returning all the home records that are between x1 and x2 latitude
and y1 and y2 longitude
.
Right now, my database schema is
---------------------
Homes
---------------------
geolat - Float (10,6)
geolng - Float (10,6)
---------------------
And my query is:
SELECT ...
WHERE geolat BETWEEN x1 AND x2
AND geolng BETWEEN y1 AND y2
- Is what I described above the best way to store the
latitude and longitude data in MySQL using Float (10,6) and separating out the longitude/latitude? If not, what is? There exist Float, Decimal and even Spatial as a data type.
- Is this the best way to perform the
SQL from a performance standpoint? If not, what is?
- Does using a different MySQL
database-engine make sense?
UPDATE: Still Unanswered
I have 3 different answers below. One person say to use Float
. One person says to use INT
. One person says to use Spatial
.
So I used MySQL "EXPLAIN" statement to measure the SQL execution speed. It appears that absolutely no difference in SQL execution (result set fetching) exist if using INT
or FLOAT
for the longitude and latitude data type..
It also appears that using the "BETWEEN
" statement is SIGNIFICANTLY faster than using the ">
" or "<
" SQL statements. It's nearly 3x faster to use "BETWEEN
" than to use the ">
" and "<
" statement.
With that being said, I still am unceratin on what the performance impact would be if using Spatial since it's unclear to me if it's supported with my version of MySQL running (v5.0.24) ... as well as how I enable it if supported.
Any help would be greatly appreacited
发布评论
评论(9)
float(10,6) 就可以了。
任何其他复杂的存储方案都需要更多的输入和输出转换,并且浮点数学速度非常快。
float(10,6) is just fine.
Any other convoluted storage schemes will require more translation in and out, and floating-point math is plenty fast.
我知道您在询问 MySQL,但如果空间数据对您的业务很重要,您可能需要重新考虑。 PostgreSQL + PostGIS 是也是免费软件,并且它们在有效管理空间和地理数据方面享有盛誉。许多人使用 PostgreSQL 只是因为 PostGIS。
不过,我对 MySQL 空间系统了解不多,所以也许它足够适合您的用例。
I know you're asking about MySQL, but if spatial data is important to your business, you might want to reconsider. PostgreSQL + PostGIS are also free software, and they have a great reputation for managing spatial and geographic data efficiently. Many people use PostgreSQL only because of PostGIS.
I don't know much about the MySQL spatial system though, so perhaps it works well enough for your use-case.
这里使用除“空间”之外的任何其他数据类型的问题是,您的“矩形选择”类型(通常,这取决于您的 DBMS 的亮度 - 并且 MySQL 通常不是最亮的)只能在一种情况下进行优化单一维度。
系统可以选择经度索引或纬度索引,并使用它来减少要检查的行集。但完成此操作后,可以选择:(a)获取所有找到的行并扫描这些行并测试“其他维度”,或者(b)在“其他维度”上执行类似的过程,然后匹配这两个结果集以查看哪些行出现在两个结果集中。后一个选项可能无法在您的特定 DBMS 引擎中实现。
空间索引有点“自动”执行后者,所以我认为可以肯定地说,空间索引在任何情况下都会提供最佳性能,但也可能是这样,它不会明显优于其他解决方案,并且这根本不值得打扰。这取决于各种因素,例如实际数据的数量和分布等。
毫无疑问,浮点(树)索引必然比整数索引慢,因为执行时间通常更长'>'对浮点数的处理比对整数的处理要好。但如果这种效果真的很明显,我会感到惊讶。
The problem with using any other data type than "spatial" here is that your kind of "rectangular selection" can (usually, this depends on how bright your DBMS is - and MySQL certainly isn't generally the brightest) only be optimised in one single dimension.
The system can pick either the longitude index or the latitude index, and use that to reduce the set of rows to inspect. But after it has done that, there is a choice of : (a) fetching all found rows and scanning over those and test for the "other dimension", or (b) doing the similar process on the "other dimension" and then afterwards matching those two result sets to see which rows appear in both. This latter option may not be implemented as such in your particular DBMS engine.
Spatial indexes sort of do the latter "automatically", so I think it's safe to say that a spatial index will give the best performance in any case, but it may also be the case that it doesn't significantly outperform the other solutions, and that it's just not worth the bother. This depends on all sorts of things like the volume of and the distribution in your actual data etc. etc.
It is certainly true that float (tree) indexes are by necessity slower than integer indexes, because of the longer time it usually takes to execute '>' on floats than it does on integers. But I would be surprised if this effect were actually noticeable.
Google 在其“商店定位器”示例中使用 float(10,6)。这对我来说就足够了。
https://stackoverflow.com/a/5994082/1094271
另外,从 MySQL 5.6.x 开始,空间扩展支持更多在功能和性能上比 PostGIS 更好且可比。
Google uses float(10,6) in their "Store locator" example. That's enough for me to go with that.
https://stackoverflow.com/a/5994082/1094271
Also, starting MySQL 5.6.x, spatial extensions support is much better and comparable to PostGIS in features and performance.
我会将其存储为以 1/1,000,000 度表示的整数(
int
,4 字节)。这会给你几英寸的分辨率。我不认为 MySQL 中有任何内在的空间数据类型。
I would store it as integers (
int
, 4-bytes) represented in 1/1,000,000th degrees. That would give you a resolution of few inches.I don't think there is any intrinsic spatial datatype in MySQL.
纬度或经度 5555.123456 在哪里?
你不是说 Float(9,6) 吗?
Where is latitude or longitude 5555.123456?
Don't you mean Float(9,6) instead?
我有完全相同的模式(float(10,6))和查询(在矩形内选择),我发现将数据库引擎从innoDB切换到myisam使表中“矩形查找点”的速度加倍拥有 780,000 条记录。
此外,我将所有 lng/lat 值转换为笛卡尔整数 (x,y),并在 x,y 上创建了一个两列索引,对于相同的查找,我的速度从 ~27 毫秒变为 1.3 毫秒。
I have the exact same schema (float(10,6)) and query (selecting inside a rectangle) and I found that switching the db engine from innoDB to myisam doubled the speed for a "point in rectangle look-up" in a table with 780,000 records.
Additionally, I converted all lng/lat values to cartesian integers (x,y) and created a two-column index on the x,y and my speed went from ~27 ms to 1.3 ms for the same look-up.
这实际上取决于您如何使用数据。但在对事实的严重过度简化中,十进制速度更快,但近似值不太准确。更多信息请参见:
http://msdn.microsoft.com /en-us/library/aa223970(SQL.80).aspx
另外,GPS 坐标的标准在 ISO 6709 中指定:
http://en.wikipedia.org/wiki/ISO_6709
It really depends on how you are using the data. But in a gross over-simplification of the facts, decimal is faster but less accurate in aproximations. More info here:
http://msdn.microsoft.com/en-us/library/aa223970(SQL.80).aspx
Also, The standard for GPS coordinates is specified in ISO 6709:
http://en.wikipedia.org/wiki/ISO_6709
我知道您可能已经克服了这个问题。我只是想为这个问题添加另一种方法,以防有人想要存储地理位置数据。
您可以将纬度和经度信息编码到 geohash 中。因为它们的前缀是可搜索到所需的精度。看来您可以将查询转换为开始和结束前缀,并使用
LIKE
查询进行前缀搜索。I know probably you would have moved past this problem. I just wanted to add another approach to this question, in case someone is looking to store geolocation data.
You could encode latitude and longitude information into a geohash. Since they are prefixed searchable to a required degree of precision. It seems you can convert your query to a start and end prefix and do a prefix search with
LIKE
query.