查找给定纬度/经度的位置(邮政编码、城市、州)的最快方法

发布于 2024-08-01 17:26:44 字数 238 浏览 2 评论 0原文

我需要一个免费(开源)的解决方案,给定纬度/经度可以返回最近的城市/州或邮政编码。 mysql 不是一个选择,如果可能的话,小型轻量级数据库将是最好的。

更新:没有网络服务,每天有 5000 万次展示,即使是最小的插件也会造成伤害,因此添加服务请求会缩短响应时间。 我不想在请求中添加超过 200 毫秒的时间。

我有数据库,csv 格式的纬度/经度/邮政编码/城市/州,它只是如何存储,更重要的是如何最快地检索它。

I need a free(open-source) solution that given the lat/lng can return the closet city/state or zip. mysql is not an option, a small lightweight database would be the best if possible.

Updates: No web services, with 50 million impressions a day even the smallest addon hurts so adding a service request would kill response time. I would prefer not to add more than 200 milliseconds on to the request.

I have the database, lat/lon/zip/city/state in csv it's just how to store and more importantly how to retrieve it the quickest.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

鹊巢 2024-08-08 17:26:44

暴力破解:将所有数据预加载到数组中。 计算当前点与数组中每个点之间的距离(有一种方法可以使用线性代数而不是三角函数来进行此计算,但我不记得它是什么)以找到最近的点。

请在否决之前阅读此内容:有很多方法可以加快像这样的强力搜索速度,但我发现它们通常不值得麻烦。 我以前不仅使用过这种方法从纬度/经度查找最近的邮政编码,而且还在 Windows Mobile 应用程序中使用过它(其中处理能力并不是完全压倒性的),并且仍然实现了亚秒级搜索时间。 只要避免使用三角函数,这并不是一个昂贵的过程。

更新:您可以通过将邮政编码数据分配到子区域(象限,例如西北、东南等)并保存每个数据点的区域 ID 来加快搜索时间。 然后,在搜索中,您首先确定当前位置所在的区域,然后仅与这些数据点进行比较。

为了避免边界错误(例如当您的当前位置靠近其区域边缘但实际上最接近邻近区域中的邮政编码时),您的区域应该在某种程度上重叠。 这意味着您的一些邮政编码记录将被重复,因此您的整体数据集会更大一些。

Brute force: pre-load all of your data into an array. Calculate the distance between your current point and each point in the array (there's a method to do this calculation that uses linear algebra instead of trig functions, but I don't recall what it is offhand) to find the closest point.

Please read this before down-voting: there are ways to speed up a brute force search like this, but I've found that they're usually not worth the trouble. Not only have I used this approach before to find nearest zip from latitude/longitude, I've used it in a Windows Mobile application (where the processing power is not exactly overwhelming) and still achieved sub-second search times. As long as you avoid the use of trig functions, this is not an expensive process.

Update: you can speed up the search time by apportioning your zip data into sub-regions (quadrants, for example, like northwest, southeast etc.) and saving the region ID with each data point. In the search, then, you first determine what region your current location is in, and compare only to those data points.

To avoid boundary errors (like when your current location is near the edge of its region but is actually closest to a zip in the neighboring region), your regions should overlap to some extent. This means some of your zip records will be duplicated, so your overall dataset will be a bit larger.

九公里浅绿 2024-08-08 17:26:44

这是一个非常有趣的问题,有着复杂的答案。

您提到了带有纬度/经度的城市数据库,但城市不是单点,这在人口稠密的地区可能会产生很大的差异,在这些地区,城市 A 的大部分地区可能更接近城市 B 的“中心”,而不是距离城市 B 的中心城市A。以一个被较小郊区包围的大城市为例。 大城市的边远地区可能比大城市本身的中心更接近郊区的中心。 捕捉到最近的城市中心意味着地图是城市中心点的 Voronoi 图。 这样的地图看起来一点也不像真实的城市地区地图。

如果您想知道给定纬度/经度的城市和州,您需要查询正确的地图并进行多边形测试中的点以找出它所在的位置。这听起来计算成本很高,但实际上还不错,如果您使用正确的空间索引并小心编码。 我运行一个网站,出售对此地理查询和其他地理查询的 API 访问权限,我们的底层引擎(用 Java 编写)可以返回包含或最近的美国城市,平均查询时间为 3e-4 秒(超过 3,000 个查询)每秒)。

尽管我们正在出售它,但我很乐意解释它是如何工作的,因为从我们这里购买它比自己建造它便宜得多,即使有说明也是如此。 所以他们在这里:

  • 找到您想要的地图。 对于美国地点,美国人口普查局提供了极其准确的地图:http:// /www.census.gov/geo/www/tiger/tgrshp2010/tgrshp2010.html。 我还没有找到与美国人口普查地图一样好的全球地图,但它们可能存在。
  • 查找或编写 ESRI shapefile 格式的解析器。 我没有具体的链接,因为它高度依赖于语言,但是网络上有许多解析器,包括免费的和商业的。 只需搜索“shapefile parser”以及您的编程语言即可。
  • 将地图加载到内存中。 数字地图由一系列由纬度/经度对列表表示的多边形组成,通常按逆时针方向排序。 大多数地图都允许剪切(例如,南非的莱索托),它们仅列为多边形,其中纬度/经度对按顺时针方向列出。 出于性能和内存消耗的原因,您将需要使用原始浮点数组(避免双精度,因为它浪费内存,并尽可能使用本机数组,以避免装箱)。
  • 接下来,您将需要代码来回答给定查询点是否包含在给定多边形中。 这是关于多边形点问题的精彩讨论: How can I certain if a 2D 点在多边形内?
  • 根据我的经验,另一个答案中建议的强力技术(检查每个实体)在国家或世界地图上效果不佳。 相反,我强烈建议使用快速空间索引,该索引返回给定纬度/经度的候选多边形列表。 这里有很多选择。 很多人会建议基于树的索引,但我倾向于更喜欢网格索引,因为它们速度更快,而且现代服务器往往有大量内存。 我编写了我使用过的唯一的此类索引。 我知道它们存在于 GIS 库中,但我发现大多数 GIS 代码过于复杂、缓慢且难以使用。 因此,给定查询纬度/经度,您可以从空间索引中获取候选多边形列表,并使用多边形内点函数查找哪个候选多边形包含查询点。
  • 处理查询点不包含在任何多边形中的情况也很重要。 在这种情况下,您可能希望找到最近的此类多边形,直至指定的最大距离。 为此,您需要确保空间索引可以返回附近多边形的列表,而不仅仅是包含多边形的候选列表。 您还需要代码来计算查询点和纬度/经度线段之间的距离(这很困难,因为纬度/经度不是欧几里德空间)。 我在网上没有找到关于如何执行此操作的任何好的讨论,因此我设计了自己的方法。 它的工作原理是在查询点周围创建一个线性化空间(在新空间中变为 (0, 0)),其中相对经度被重新缩放,使得修改后的经度的度数为距离与纬度相同(涉及将相对经度乘以纬度的余弦)。 在这个线性化空间中,您可以使用标准方法找到线段上最近的点(请参阅点与线段之间的最短距离),然后将该点转换回纬度/经度,并使用半正弦公式计算两点之间的距离(请参阅计算两个经纬度点之间的距离?(半正弦公式)) 。

就是这样。 我断断续续地搭建了这样一个系统,用了大约半年的时间。 我的估计是,其中至少有三个人月的认真编码,而且是熟悉该主题的人(因此,如果您正在做出购买或构建的决定,请小心)。

This is a very interesting question with a complex answer.

You mention a database of cities with lat/lon, but cities are not single points and this can make a big difference in densely populated areas where large parts of city A might be closer to the "center" of city B than to the center of city A. Take a big city surrounded by smaller suburbs. The outlying parts of the big city might be closer to the centers of the suburbs than to center of the big city itself. Snapping to the nearest city center implies a map that is the Voronoi diagram of city center points. Such a map would not look anything like an actual map of urban areas.

If you want to know the city and state for a given lat/lon, you need to query a proper map and do point in polygons tests to find out which one it is in. This sounds computationally expensive, but it is actually not bad if you use a proper spatial index and are careful in your coding. I run a web site that sells API access to this and other geographical queries, and our underlying engine (written in Java) can return the containing or nearest city in the US with an average query time of 3e-4 seconds (more than 3,000 queries per second).

Even though we are selling it, I'm happy to explain how it works, since it would be way cheaper to buy it from us than to build it yourself, even with instructions. So here they are:

  • Find the map that you want. For US locations, the US Census offers extremely accurate maps at: http://www.census.gov/geo/www/tiger/tgrshp2010/tgrshp2010.html. I've not found global maps that are as good as the US census maps, but they may exist.
  • Find or write a parser for the ESRI shapefile format. I don't have a specific link for this, as it is highly language dependent, but there are numerous parsers, both free and commercial available on the web. Just do a search for "shapefile parser" along with your programming language.
  • Load the map into memory. A digital map consists of a list of polygons represented by a list of lat/lon pairs, typically ordered in a counter clockwise direction. Most maps allow for cut-outs (e.g., Lesotho in South Africa), which are just listed as polygons where the lat/lon pairs are listed in the clockwise direction. For performance and memory consumption reasons, you will want to use raw float arrays (avoid double precision, as it wastes memory, and use native arrays where possible, to avoid boxing).
  • Next, you will need code to answer whether a given query point is contained in a given polygon. Here is an excellent discussion of the point-in-polygon problem: How can I determine whether a 2D Point is within a Polygon?
  • In my experience, the brute force technique suggested in another answer (checking every entity) does not work well on national or world maps. Instead, I strongly suggest a fast spatial index that returns a list of candidate polygons for a given lat/lon. Here there are a lot of options. A lot of people would suggest tree based indexes, but I tend to prefer grid indexes, as they are faster and modern servers tend to have a lot of memory. I wrote the only such index that I've worked with. I know they exist in GIS libraries, but I find most GIS code is overly complex, slow, and hard to use. So given a query lat/lon, you get a list of candidate polygons from the spatial index and use the point-in-polygon function to find which of the candidates contains the query point.
  • It is also important to handle cases where the query point is not contained by any polygon. In such a case, you will presumably want to find the nearest such polygon up to a specified maximum distance. To do this, you need to make sure that your spatial index can return a list of nearby polygons, and not just a list of candidate containing polygons. You will also need code to compute the distance between a query point and a lat/lon line segment (this is hard because lat/lon is not a Euclidean space). I've not found any good discussion of how to do this online, so I devised my own method. It works by creating a linearized space around the query point (which becomes (0, 0) in the new space) in which the relative longitude is re-scaled such that a degree of the modified longitude is the same distance as a degree of latitude (involves multiplying the relative longitude by the cosine of the latitude). In this linearized space you find the nearest point on the line segment using standard methods (see Shortest distance between a point and a line segment), and then convert that point back into lat/lon and use the Haversine formula to compute the distance between the two points (see Calculate distance between two latitude-longitude points? (Haversine formula)).

And that's it. I built such a system on and off for about half a year. My estimate is that there are at least three man months of serious coding in it, and that's someone familiar with the subject matter (so beware if you are making a buy-or-build decision).

皓月长歌 2024-08-08 17:26:44

使用 kd-tree 加速最近邻搜索。 无论您的平台是什么,都应该有很多免费的实现可用。

Use a kd-tree to speed up the nearest-neighbor search. There should be lots of free implementations available whatever your platform is.

删除会话 2024-08-08 17:26:44

它不是开源的,但也许您可以使用 Google Maps API:

反向地理编码

Its not open-source but maybe you could use the Google Maps API:

Reverse Geocoding

一城柳絮吹成雪 2024-08-08 17:26:44

您预计最近的城市距离您的来源位置有多远? 50英里? 200英里? 500英里? 如果两个城市的距离几乎相等,那么您的算法是否选择最接近的一个有关系吗? 您可以使用此信息来帮助加快搜索速度。

如果您可以合理地假设距离差异很小(大约 250 英里左右可能足够近,可以被视为“小”),并且您的距离计算可能有点“模糊”,那么您可以优化“蛮力”通过将搜索空间限制为距离源 +/- 5 纬度(每纬度约 70 英里,因此这为您提供了向北和向南 350 英里左右的距离)和 +/- 5 长(假设您没有搜索对于两极城市,该范围为从赤道约 350 英里到加拿大北部约 100 英里的任何地方)。 将这些范围调整为您认为适合您的问题空间的范围。

虽然三角函数将帮助您精确指示距离,但对于诸如此类的较小距离,毕达哥拉斯通常足够接近“最佳猜测”答案,其中 x = 69.1 * (sourcelat - citylat) 和 y = 53.0 * (sourcelong -城隆)。

How far from your source location would you expect the closest city to be? 50 miles? 200 miles? 500 miles? If two cities are nearly equidistant, does it matter if your algorithm picks the exactly closer one? You can use this information to help speed your search.

If you can reasonably assume that the distance difference is small (~250 mi or so is probably close enough to be considered 'small'), and your distance calculation can be a bit 'fuzzy', then you can optimize the 'brute force' check by limiting your search space to +/- 5 lat from the source (~70 miles per lat, so this gives you 350 or so miles to the north and south), and +/- 5 long (presuming you aren't searching for cities at the poles, this is anywhere from ~350 mi at the equator to ~100 mi in northern Canada). Adjust these ranges to what you feel is appropriate for your problem space.

While trig functions will help give you a precise indication of distance, for smaller distances such as these Pythagorean is generally close enough for a 'best guess' answer, with x = 69.1 * (sourcelat - citylat) and y = 53.0 * (sourcelong - citylong).

初熏 2024-08-08 17:26:44

您应该查看 geonames。 他们有一个返回 XML 和/或 JSON 的 API。
另外,你可以dl他们的数据库。

you should check out geonames. they have an API that returns XML and/or JSON.
also, you can dl their database.

月朦胧 2024-08-08 17:26:44

另一个线程通过 MaxMind 推荐 mod_geoip。
它在 Apache 级别运行,甚至在到达 PHP/.NET/Java 之前。
Maxmind 地理定位 API:Apache 与 PHP

Another thread recommends mod_geoip via MaxMind.
It runs at the Apache level, before it even gets to the PHP/.NET/Java.
Maxmind geolocation apis: Apache vs PHP

樱桃奶球 2024-08-08 17:26:44

如果您同时拥有邮政编码和当前位置的经度和纬度,您只需计算半径并找到该圆内的点即可。 如果您为每个邮政编码范围设定一个假设的边界,则可以加快搜索速度。

如果您可以使用 SQL 2008(标准或快速),您可以使用 空间数据类型。

If you have both the long and the lat for the zip and the current location you could just calculate a radius and find the points within that circle. If you make an assumed boundry of each zipcode range you could speed up the search.

If you can use SQL 2008 (standard or express) you could use Spatial data types.

廻憶裏菂餘溫 2024-08-08 17:26:44

雅虎! Placemaker 是一个可以做到这一点的免费网络服务。 它可以查找地名(“纽约市”、“白金汉宫”),但也可以使用 地理微格式

要使用该服务,您需要提交一个 POST 请求,它会返回 XML:

一个小型命令行示例(我已经隐藏了我的 Yahoo! 应用程序 ID;您需要注册自己的 ID):

$ curl -X POST -ddocumentContent='<div class="geo">GEO: <span class="latitude">37.386013</span>, <span class="longitude">-122.082932</span></div>' -ddocumentType='text/html' -dappid='your_yahoo_app_id' http://wherein.yahooapis.com/v1/document

这将返回一个非常详细的 XML 文档,其中一部分是:

<type>Town</type>
<name><![CDATA[Los Altos, CA, US]]></name>

它还包含以下数据:

<type>Zip</type>
<name><![CDATA[94024, Los Altos, CA, US]]></name>

我不太用Placemaker,但是我用过他们的 地理编码 API 而且速度非常快。 将其与本地 memcached 结合起来,用户就不会知道数据不是本地的。

The Yahoo! Placemaker is a free web service that can do this. It can look up place names (“New York City”, “Buckingham Palace”) but it can also look up latitudes and longitudes by using the Geo microformat.

To use the service, you submit a POST request, and it returns XML:

A small command-line example (I’ve obscured my Yahoo! app ID; you’ll need to register your own):

$ curl -X POST -ddocumentContent='<div class="geo">GEO: <span class="latitude">37.386013</span>, <span class="longitude">-122.082932</span></div>' -ddocumentType='text/html' -dappid='your_yahoo_app_id' http://wherein.yahooapis.com/v1/document

This returns a very detailed XML document, part of which is:

<type>Town</type>
<name><![CDATA[Los Altos, CA, US]]></name>

It also contains the following data:

<type>Zip</type>
<name><![CDATA[94024, Los Altos, CA, US]]></name>

I have not used Placemaker very much, but I have used their Geocoding API and it is very fast. Couple this with a local memcached and users have no idea the data isn’t local.

猛虎独行 2024-08-08 17:26:44

查看 geonames.org 数据库以获取源数据。

对于轻量级数据库来说,sqlite 是一个不错的选择。

geonames 也提供网络服务,但如果您想自己做而不需要网络调用(听起来好像您这样做),那么您将需要一个本地数据库。 然后,您只需要进行正确的三角计算即可计算出一对纬度/经度点之间的大圆距离(谷歌),然后按距离对结果进行排序。 如果您想在计算之前限制搜索半径,您还可以使用边界框或半径。

如果您的本地数据库可以是基于 SQL 的(sqllite3 就是),那么所有这些都会添加到一个 SQL 查询中,该查询添加了一堆三角计算来计算“距离”列,也许还有一个类似的“where”子句来限制搜索范围半径或边界框。 计算出查询中的距离列后,就可以轻松按距离排序并添加您喜欢的任何其他条件。 如果您了解 ruby​​/rails 并且想查看如何完成此操作的一个很好的示例,请查看 GeoKit Rails 插件源代码。

Look at the geonames.org database for source data.

For a light database, sqlite is a good choice.

geonames also does a webservice, but if you want to do it yourself without a web call (and it sounds as though you do) then you will need a local database. Then, you just need to do the right trig calculations to work out the great circle distance (google that) between a pair of lat / lng points and then order the results by distance. You can also use a bounding box or radius, if you want to limit the search radius before doing the calculations.

If your local database can be SQL based (which sqllite3 is) then that all adds up to a SQL query which adds a bunch of trig calculations to calculate a 'distance' column and maybe also a similar 'where' clause to limit the search within a radius or bounding box. Having calculated the distance column in your query then it is easy to order by distance and add any other criteria you like. If you know ruby/rails and want to see a nice example of how this is done, look at the GeoKit rails plugin source.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文