我工作的网站之一是某种社交网站,通过使用某种位置服务根据邻近程度推荐“朋友”,可以大大增强内容。该网站主要针对美国,但潜在用户遍布全球。
我考虑过创建一个包含国家、州/省/地区、县和城市的关联数组或关系数据库,以提供一种粗略的方法来深入了解它们的相对距离,但这可能非常不方便且非常复杂。
我也考虑过 IP 地理定位,但结果往往不可靠(某些服务显示我公司的 IP 位于东北约 600 英里),而且我至少需要某种后备来查找,例如,邮政编码/邮政编码。
您能否告诉我一种明确定义的方法来有效地在本地进行此类查找,而不使用第 3 方 API,最好至少参考在哪里收集基本信息信息首先来自哪里?我目前正在运行 PHP 5.3.2 和 MySQL 5.1.44,如果有什么区别的话。
谢谢你!
编辑:
添加了赏金,以尝试获得更好的想法,或处理问题的其他方法,也许更有效。事实上,由于巨大的数据库大小而导致的加载时间是疯狂的。我认为我肯定需要改进我的缓存,但我正在尝试看看是否有什么我应该做的事情来改进我的定位系统。
One of the sites I work on is a social networking site of sorts, and the content would be greatly enhanced by using some sort of location service to recommend "friends" based on proximity. The site focuses on the US, but with potential users worldwide.
I've considered creating an associative array or relational database with countries, states/provinces/territories, counties, and cities to provide a rough way to drill down to their relative proximity, but this can be extremely unwieldy and complicated very quickly.
I've also considered IP geolocation, but the results tend to be unreliable (some services show my company's IP as located some 600 miles North-east), and I would at least need some sort of fallback to lookup, for instance, a zip/postal code.
Can you tell me a clear defined way to effectively do this sort of lookup locally, without use of 3rd party APIs, preferably with at least some reference to where to gather the basic information from in the first place? I'm currently running PHP 5.3.2 and MySQL 5.1.44, if it makes any difference.
Thank you!
EDIT:
Added a bounty to try to get better ideas, or other ways of handling the problem, perhaps more efficiently. As it is, the load time due to the huge database size is insane. I figure I definitely need to improve my caching, but I'm trying to see if there's anything I should be doing with regard to improving my location system.
发布评论
评论(3)
这可能有点明显......但是您可以最准确地了解用户位置的唯一方法是实际上:
询问用户他们在哪里!
一旦您询问用户他们在哪里,您就可以使用第三方应用程序来计算距离。
如果您不想像您提到的问题那样使用任何第三方应用程序,那么您可以下载地理数据库之一并将其集成到您自己的服务中。
我使用的来源是雅虎地理星球。
您可以下载 TSV 格式的整个 GeoPlanet 数据文件。当我下载它时,我只是使用 mysqlimport 将它导入到 mysql 中。
http://developer.yahoo.com/geo/geoplanet/data/
它包含世界上每个不同地理位置的记录。大量的邮政编码、地区、地区、国家,几乎是您需要的一切。
除此之外,它还包含邻居,因此您可以根据靠近的地理区域进行查询。
This might be a bit obvious... but the only way that you can know the location of a user, with the best degree of accuracy is to actually:
Ask the User where they are!
Once you have asked the user where they are, you can then use third party applications to figure out distances.
If you don't want to use any third party application as your question mentioned, then you could download and integrate one of the Geo databases into your own service.
The source which I use is Yahoo Geo Planet.
You can download the entire GeoPlanet Data file which comes in TSV format. When I downloaded it I just imported it to mysql using mysqlimport.
http://developer.yahoo.com/geo/geoplanet/data/
It contains a record for every distinct geographically location in the world. A tonne of post codes, districts, regions, countries, practically everything you would ever need.
In addition to that, it contains neighbours, so you can query based on geographic regions which are close to.
不幸的是,仅仅询问它们在哪里还不够好,虽然 GeoPlanet 是一个不错的选择,而且我已经决定使用它,但我认为这不是一个完整的答案。是的,它有效,但是-如何-。别名不包括拼写错误,虽然大多数外地人称旧金山为“San Fran”或“Frisco”,但当地人使用“The City”,因此别名并不总是有效。我需要一定程度的精确性。
好吧,经过一些工作,这是我使用的方法,该方法有点密集,可能不适合每个人,但对我有用:
首先,从 http://developer.yahoo.com/geo/geoplanet/data/ (105 MB压缩)
为了将其导入到我的 MySQL 数据库中,我创建了表,其中的列根据 zip 中的自述文件命名。 Geoplanet_places 是唯一一个具有与 WOE_ID 关联的主键的位置。这个和 geoplanet_adjaccies 确实是我目前唯一需要的表。对我来说,导入是在本地数据库中完成的:
我从 .tsv 中删除了版本号,并使用文件名作为表名。您的体验可能会大不相同,但为了清楚起见,我将其添加到其中。导入所有您想要的文件。
我决定为输入个人资料数据的人们提供两个选项:您始终必须选择您所在的国家/地区(从选项列表中,使用 ISO 3166 Alpha-2 代码作为值),但我们可以使用邮政编码(邮政编码/密码) ) 查找它们所在位置的代码;或者,对于像爱尔兰这样缺乏国家邮政编码系统的国家,他们可以输入其城市和省份名称。
要使用国家/地区和邮政编码进行搜索,我可以执行以下操作:
计算结果。如果为 0,我没有结果,该位置未知,并且我假设存在问题(相应地记录错误以确认它不是侥幸)。如果有多个,则会枚举结果,并弹出下一个屏幕,要求确认它们所在的位置。理想情况下,邮政编码系统永远不会发生这种情况,但根据位置询问时可能会发生这种情况。如果只有一个,我将 Parent_ID 存储到他们的个人资料中,同时继续查询,将 Parent_ID 作为与 WOE_ID 的比较器传回,如下所示:
其中
$pid
是前一个 Parent_ID - I稍后在渲染页面时将使用它来确定位置,并且城镇/城市的级别足够低,可以在邻接表上应用邻近检查。当我使用 MySQLWorkbench 运行它时,尝试连接结果比抛出多个查询要慢得多。我继续查询,直到Parent_ID="1"
意味着它的父级是世界(它是一个国家)。我决定,当我使用文本输入搜索城市、州/省和国家/地区时,我必须通过使用 Metaphone 处理器进行确认来确保输入准确,以确定他们可能的选择(如果第一次无法找到) 。不幸的是,有些人要么不会拼写,要么网站的主要语言不是他们的主要语言。
为了显示位置,我从存储在其个人资料中的 WOE_ID 开始,获取名称,然后查找其父级。我用逗号分隔以获得像 Irvine, Orange, CA, USA 这样的结果。我可以根据这些名称中的任何一个进行查找,以确定使用邻接表和位置表来确定附近的其他成员。
再说一遍,这可能不是最好的方法,如果您在旅行时使用酒店 wifi,则使用地理位置可能会发生变化;然而,这种方法似乎“足够接近政府工作”,所以我想我会分享我的解决方案,尽管它可能毫无价值。
Unfortunately, simply asking where they are isn't quite good enough, and while GeoPlanet is a good option, and I have decided to use it, I didn't feel it was a complete answer. Yes, it works, but -how-. Aliases don't cover misspellings, and while most outsiders call San Francisco things like "San Fran" or "Frisco", locals use "The City", so aliases don't always work. I needed some level of exactitude.
Well, after some work, here's the approach I've used, which is a bit intensive, and may not be an option for everybody, but works for me:
First thing, grab a copy of the GeoPlanet db in TSV format from http://developer.yahoo.com/geo/geoplanet/data/ (105 MB Zipped)
To import this into my MySQL db, I created the tables with columns named according to the Readme file located in the zip. Geoplanet_places was the only one given a primary key associated to the WOE_ID. This and geoplanet_adjacencies are really the only tables I need at this moment. For me, importation was done locally to my DB using:
I stripped the version number from the .tsv, and used the filename as the table name. Your experience may be significantly different, but I'm adding it for clarity. Import all the files you want.
I decided to have two options for people entering their profile data: You always have to select your country (from an option list, using ISO 3166 Alpha-2 Codes as the value), but we can then use either the postal (ZIP/PIN) code to look up where they are; or, for countries like Ireland lacking a national postal code system, they can enter their city and province name.
To search using country and postal code, I can do something like this:
I count the results. If 0, I have no result, the place is not known, and I assume a problem (An error is logged accordingly to confirm it is not a fluke). If there is more than one, the results are enumerated and a next screen pops up asking to confirm in which location they reside. Ideally, this should never happen with the postal code system, but may occur when asking based on location. If there is only one, I store the Parent_ID to their profile asI continue to query back, passing back in the Parent_ID as a comparator to the WOE_ID, as so:
Where
$pid
is the previous Parent_ID - I'll use this later on when rendering the page to determine location, and Town/City is low enough of a level to apply proximity checks on the adjacencies table. Trying to join the results was significantly slower than throwing multiple queries when I ran it with MySQLWorkbench. I continue the queries untilParent_ID="1"
meaning that it's parent is the world (it is a country).I decided that when I'm searching using text entry for city, state/province, and country, I'll have to guarantee accurate entry by confirming using a Metaphone processor to determine their likely selection if it can't be found the first time. Unfortunately some people either can't spell or the primary language of the site is not their primary language.
To display location, I start with the WOE_ID stored in their profile, get the name, then look up it's parent. I comma-separate to get a result like Irvine, Orange, CA, USA. I can look up based on any one of these names to determine other members in proximity using the adjacencies and places tables.
Again, this probably isn't the best way to go about it, and using Geolocation can change if, for instance, you're on a trip using the hotel wifi; however, this method seems "close enough for government work", so I thought I'd share my solution as worthless as it may be.
该解决方案通常更准确且更准确。比城市级别的唯一匹配有用,但如果您只有用户的地址,则在用户注册时将要求您使用第三方服务进行地理编码。希望它仍然有帮助。
1) 获取用户的位置。使用尽可能多的信息:
2)您需要与用户一起存储位置的纬度和经度。如果您尚未从传感器查找或 Geo IP 数据库中获取该地址,则需要对该地址进行地理编码查找。您要求不要使用第三方服务,但确实没有办法解决它(这就是这些服务存在的原因;推出自己的服务非常复杂且昂贵)。请参阅 http://en.wikipedia.org/wiki/Geocoding#List_of_some_geocoding_systems 查看列表您可以使用的地理编码服务。
3) 现在,您可以通过计算从您的搜索位置到每个用户位置的距离并对其邻近度设置限制来搜索用户。示例:(
参考:http://jehiah.cz/a/spatial-proximity-使用经纬度搜索)
This solution is generally more accurate & useful than the only matching at the city level, but it will require you to use third-party services for geocoding when a user signs up if you only have their address. Hope it still helps.
1) Get the users's location. Use as much information as you can get:
2) You need to store the location's latitude and longitude along with the user. If you don't already have it from a sensor lookup or Geo IP database, you will need to do a geocode lookup on the address. You asked not to use a third party service, but there really isn't a way around it (that's why the services exist; rolling your own is very complicated and expensive). See http://en.wikipedia.org/wiki/Geocoding#List_of_some_geocoding_systems for a list of geocoding services you can use.
3) You can now search users by calculating the distance from your search location to each user's location and putting a limit on it for the proximity. Example:
(reference: http://jehiah.cz/a/spatial-proximity-searching-using-latlongs)