如何根据位置字段将用户分类到不同的国家/地区

发布于 2024-08-01 17:24:01 字数 1871 浏览 7 评论 0原文

大多数网络应用程序都有一个位置字段,用户可以在其中输入他们选择的位置。

您如何根据输入的位置将用户分类到不同的国家/地区。

例如,我使用了 users.xml 的 Stack Overflow 转储并提取了用户的姓名、声誉和位置:

['Jeff Atwood', '12853', 'El Cerrito, CA']
['Jarrod Dixon', '1114', 'Morganton, NC']
['Sneakers OToole', '200', 'Unknown']
['Greg Hurlman', '5327', 'Halfway between the boardwalk and Six Flags, NJ']
['Power-coder', '812', 'Burlington, Ontario, Canada']
['Chris Jester-Young', '16509', 'Durham, NC']
['Teifion', '7024', 'Wales']
['Grant', '3333', 'Georgia']
['TimM', '133', 'Alabama']
['Leon Bambrick', '2450', 'Australia']
['Coincoin', '3801', 'Montreal']
['Tom Grochowicz', '125', 'NJ']
['Rex M', '12822', 'US']
['Dillie-O', '7109', 'Prescott, AZ']
['Pete', '653', 'Reynoldsburg, OH']
['Nick Berardi', '9762', 'Phoenixville, PA']
['Kandis', '39', '']
['Shawn', '4248', 'philadelphia']
['Yaakov Ellis', '3651', 'Israel']
['redwards', '21', 'US']
['Dave Ward', '4831', 'Atlanta']
['Liron Yahdav', '527', 'San Rafael, CA']
['Geoff Dalgas', '648', 'Corvallis, OR']
['Kevin Dente', '1619', 'Oakland, CA']
['Tom', '3316', '']
['denny', '573', 'Winchester, VA']
['Karl Seguin', '4195', 'Ottawa']
['Bob', '4652', 'US']
['saniul', '2352', 'London, UK']
['saint_groceon', '1087', 'Houston, TX']
['Tim Boland', '192', 'Cincinnati Ohio']
['Darren Kopp', '5807', 'Woods Cross, UT']

使用以下 Python 脚本:

from xml.etree import ElementTree

root = ElementTree.parse('SO Export/so-export-2009-05/users.xml').getroot()
items = ['DisplayName','Reputation','Location']

def loop1():
    for count,i in enumerate(root):
    det = [i.get(x) for x in items]
    print det
    if count>30: break

loop1()

将人们分类到不同国家/地区的最简单方法是什么? 是否有任何现成的查找表可以为我提供一个输出,表明 X 位置属于 Y 国家/地区?

查找表不需要完全准确。 通过在 Google 或更好的 Wolfram Alpha 上查询位置字符串可以获得相当准确的答案。

Most web applications have a Location field, in which uses may enter a Location of their choice.

How would you classify users into different countries, based on the location entered.

For eg, I used the Stack Overflow dump of users.xml and extracted users' names, reputation and location:

['Jeff Atwood', '12853', 'El Cerrito, CA']
['Jarrod Dixon', '1114', 'Morganton, NC']
['Sneakers OToole', '200', 'Unknown']
['Greg Hurlman', '5327', 'Halfway between the boardwalk and Six Flags, NJ']
['Power-coder', '812', 'Burlington, Ontario, Canada']
['Chris Jester-Young', '16509', 'Durham, NC']
['Teifion', '7024', 'Wales']
['Grant', '3333', 'Georgia']
['TimM', '133', 'Alabama']
['Leon Bambrick', '2450', 'Australia']
['Coincoin', '3801', 'Montreal']
['Tom Grochowicz', '125', 'NJ']
['Rex M', '12822', 'US']
['Dillie-O', '7109', 'Prescott, AZ']
['Pete', '653', 'Reynoldsburg, OH']
['Nick Berardi', '9762', 'Phoenixville, PA']
['Kandis', '39', '']
['Shawn', '4248', 'philadelphia']
['Yaakov Ellis', '3651', 'Israel']
['redwards', '21', 'US']
['Dave Ward', '4831', 'Atlanta']
['Liron Yahdav', '527', 'San Rafael, CA']
['Geoff Dalgas', '648', 'Corvallis, OR']
['Kevin Dente', '1619', 'Oakland, CA']
['Tom', '3316', '']
['denny', '573', 'Winchester, VA']
['Karl Seguin', '4195', 'Ottawa']
['Bob', '4652', 'US']
['saniul', '2352', 'London, UK']
['saint_groceon', '1087', 'Houston, TX']
['Tim Boland', '192', 'Cincinnati Ohio']
['Darren Kopp', '5807', 'Woods Cross, UT']

using the following Python script:

from xml.etree import ElementTree

root = ElementTree.parse('SO Export/so-export-2009-05/users.xml').getroot()
items = ['DisplayName','Reputation','Location']

def loop1():
    for count,i in enumerate(root):
    det = [i.get(x) for x in items]
    print det
    if count>30: break

loop1()

What is the simplest way to classify people into different countries? Are there any ready lookup tables available that provide me an output saying X location belongs to Y country?

The lookup table need not be totally accurate. Reasonably accurate answers are obtained by querying the location string on Google, or better still, Wolfram Alpha.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

驱逐舰岛风号 2024-08-08 17:24:01

强制用户指定国家/地区,因为您必须处理含糊不清的情况。 这将是正确的方法。

如果不可能,至少结合他们的 IP 地址做出最佳猜测。

例如,['Grant', '3333', 'Georgia']

这是美国佐治亚州吗?
或者这就是格鲁吉亚共和国?

如果他们的 IP 地址表明位于中亚或东欧的某个地方,那么很可能是格鲁吉亚共和国。 如果是北美,很可能指的是美国佐治亚州。

请注意,IP 地址到国家/地区的映射并非 100% 准确,数据库需要定期更新。 在我看来,太麻烦了。

Force users to specify country, because you'll have to deal with ambiguities. This would be the right way.

If that's not possible, at least make your best-guess in conjunction with their IP address.

For example, ['Grant', '3333', 'Georgia']

Is this Georgia, USA?
Or is this the Republic of Georgia?

If their IP address suggests somewhere in Central Asia or Eastern Europe, then chances are it's the Republic of Georgia. If it's North America, chances are pretty good they mean Georgia, USA.

Note that mappings for IP address to country isn't 100% accurate, and the database needs to be updated regularly. In my opinion, far too much trouble.

紫﹏色ふ单纯 2024-08-08 17:24:01

您最好的选择是使用地理编码 API,例如 geopy (一些示例)。

例如,Google Geocoding API 将返回以下国家/地区:响应的 CountryNameCode 字段。

仅使用这一位置字段,错误匹配的数量可能会相对较高,但也许已经足够好了。

如果您有服务器日志,您可以尝试使用 IP 地理编码器查找用户的 IP 地址(更多信息和指针位于 维基百科

You best bet is to use a Geocoding API like geopy (some Examples).

The Google Geocoding API, for example, will return the country in the CountryNameCode-field of the response.

With just this one location field the number of false matches will probably be relatively high, but maybe it is good enough.

If you had server logs, you could try to also look up the users IP address with an IP geocoder (more information and pointers on Wikipedia

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文