数据库/NoSQL - 检索以下数据的最低延迟方式

发布于 2024-09-01 06:25:58 字数 487 浏览 3 评论 0原文

我有一个房地产应用程序,“房屋”包含以下信息:

house:
- house_id 
- address
- city 
- state
- zip
- price
- sqft
- bedrooms
- bathrooms
- geo_latitude
- geo_longitude

我需要对地理坐标框中的所有房屋进行极快(低延迟)检索

类似于下面的 SQL(如果我要使用数据库):

SELECT * from houses 
WHERE latitude IS BETWEEN xxx AND yyy
AND longitude IS BETWEEN www AND zzz

问题:存储此信息的最快方法是什么,以便我可以根据纬度和速度执行最快的数据检索;经度? (例如数据库、NoSQL、memcache 等)?

I have a real estate application and a "house" contains the following information:

house:
- house_id 
- address
- city 
- state
- zip
- price
- sqft
- bedrooms
- bathrooms
- geo_latitude
- geo_longitude

I need to perform an EXTREMELY fast (low latency) retrieval of all homes within a geo-coordinate box.

Something like the SQL below (if I were to use a database):

SELECT * from houses 
WHERE latitude IS BETWEEN xxx AND yyy
AND longitude IS BETWEEN www AND zzz

Question: What would be the quickest way for me to store this information so that I can perform the fastest retrieval of data based on latitude & longitude? (e.g. database, NoSQL, memcache, etc)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

奢华的一滴泪 2024-09-08 06:25:58

这是地理信息系统 (GIS) 应用程序的典型查询。其中许多问题可以通过使用四叉树或类似的空间索引来解决。所提到的平铺是这些通常最终实现的方式。

如果包含坐标的索引可以放入内存,并且 DBMS 具有不错的优化器,那么表扫描可以以相当低的开销提供距任何感兴趣点的笛卡尔距离。如果这太慢,则可以在进行全距离计算之前通过单独比较每个坐标轴来预先过滤查询。

This is a typical query for a Geographical Information System (GIS) application. Many of these are solved by using quad-tree, or similar spatial, indices. The tiling mentioned is how these often end up being implemented.

If an index containing the coordinates could fit into memory and the DBMS had a decent optimiser, then a table scan could provide a Cartesian distance from any point of interest with tolerably low overhead. If this is too slow, then the query could be pre-filtered by comparing each coordinate axis separately before doing the full distance calculation.

琴流音 2024-09-08 06:25:58

MongoDB 支持地理空间索引,但是有一些方法可以减少此类事情的计算时间。根据数据的排列方式,您可以将房屋放置在可识别的“图块”中,然后获取给定图块的所有房屋,并从该简化的数据集中,根据与您拥有的任何坐标的距离进行排序。

根据瓷砖的数量,您可以使用位掩码来查找可能靠近或重叠多个瓷砖的房屋。

ThereMongoDB supports geospatial indexes, but there are ways to reduce the computation time for things like this. Depending on how your data is arranged, you can place houses in identifiable 'tiles' and then fetch all houses for a given tile and, from that reduced dataset, sort based on distance from whatever coordinates you have.

Depending on how many tiles there are, you can use bitmasks to find houses that may be near or overlap multiple tiles.

请爱~陌生人 2024-09-08 06:25:58

我假设您执行的读取操作比写入操作多得多,并且您不需要将数据库分布在数十台计算机上。如果是这样,您应该选择读取优化的数据库,例如 sqlite(我个人的偏好)或 mysql,并准确使用您建议的 SQL 查询。

大多数(不是全部)NoSQL 数据库最终对于此类查询来说过于复杂,因为它们更擅长在索引中查找精确值而不是范围。

很高兴您正在寻找边界框而不是笛卡尔距离;后者对于 SQL 数据库来说更难优化(尽管您可以将其缩小到一个边界框,然后进行较慢的笛卡尔距离计算)。

I'm going to assume that you're doing lots more reads than writes, and you don't need to have your database distributed across dozens of machines. If so, you should go for a read-optimized database like sqlite (my personal preference) or mysql, and use exactly the SQL query you suggest.

Most (not all) NoSQL databases end up being overly complicated for queries of this sort, since they're better at looking up exact values in their indexes rather than ranges.

It's nice that you're looking for a bounding box instead of cartesian distance; the latter would be harder for a SQL database to optimize (although you could narrow it to a bounding box, then do the slower cartesian distance calculation).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文