用户输入解析 - 城市/州/邮政编码/国家
我正在寻找有关解析城市/州/邮政编码/国家/地区多种组合的用户输入的建议。
一个常见的例子是谷歌地图的功能。
输入的一些示例是:
- “城市,州,国家”
- “城市,国家”
- “城市,邮政编码,国家”
- “城市,州,邮政编码”
- “邮政编码”
解析此内容的有效且正确的方法是什么来自用户的输入?
如果您知道任何示例实现,请分享:)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
第一步是使用空格或逗号作为分隔字符将文本分解为单独的标记。 为了可扩展性,您可以将每个令牌交给线程或服务器(如果使用类似 Map-Reducer 的架构)来找出每个令牌是什么。 例如,
获得单独的令牌结果后,您可以将各个部分重新粘合在一起以获得完整的地址。 在存在问题的情况下,您可以提示用户他们的真正含义(例如 Google 地图)并将该信息添加到学习列表中。
假设您不尝试构建地图系统,向应用程序添加该支持的最简单方法是查询 Google 或 Yahoo 并要求他们为您解析日期。
The first step would be to break up the text into individual tokens using spaces or commas as the delimiting characters. For scalability, you can then hand each token to a thread or server (if using a Map-Reducer like architecture) to figure out what each token is. For instance,
Once you have the individual token results, you can glue the parts back together to get a full address. In the cases where there are questions, you can prompt the user what they really meant (like Google maps) and add that information to a learned list.
The easiest method to add that support to an applications, assuming you're not trying to build a map system, is to query Google or Yahoo and ask them to parse the date for you.
我本人对谷歌的处理方式非常着迷。 我不记得在其他地方见过类似的事情。
我相信,您尝试使用各种分隔符(空格、逗号、分号等)用单词分隔输入字符串。然后您就会有几种组合。 对于每个组合,您将每个单词与国家、城市、城镇、邮政编码数据库进行匹配。 然后,您定义一些关于如何评估每个组合的小组匹配结果的指标。 这里还应该有交叉规则,例如,如果邮政编码不太匹配,但国家、城市、城镇匹配良好,并且组合起来引用有效地址,则该指标会产生高分。
这确实很困难,而且不是晚上的代码练习。 它还需要强大的计算资源 - 共享主机可能会在 10 个请求下崩溃,但数据中心可以很好地满足它。
不确定是否有示例实现。 许多地理服务是有偿提供的。 像谷歌地图这样复杂的东西可能要花一大笔钱。
如我错了请纠正我。
I am myself very fascinated with how Google handles that. I do not remember seeing anything similar anywhere else.
I believe, you try to separate an input string in words trying various delimeters - space, comma, semicolon etc. Then you have several combinations. For each combination, you take each words and match it against country, city, town, postal code database. Then you define some metric on how to evaluate the group match result for each combination. Here should also be cross rules, like if the postal code does not match well, but country, city, town match well and in combination refer to a valid address then the metric yields a high mark.
It is sure difficult and not an evening code exercise. It also requires strong computational resources - a shared hosting would probably crack under just 10 requests, but a data center could serve it well.
Not sure if there is an example implementation. Many geographical services are offered on paid basis. Something that sophisticated as GoogleMaps would likely cost a fortune.
Correct me if I'm wrong.
我找到了一个简单的 PHP 实现
雅虎似乎有一个提供功能(某种程度)的网络服务
Openstreetmap 似乎在其主页上提供了相同的搜索功能
I found a simple PHP implementation
Yahoo seems to have a webservice that offers the functionality (sort of)
Openstreetmap seems to offer the same search functionality on its homepage
假设您只处理这四个字段(城市、邮政编码、州、国家/地区),则除城市之外的所有字段都有有限值,即使我猜您是否有一个大城市列表也是有限的。 因此,只需用逗号分隔每个字段,然后检查每个字段列表即可。
假设我们谈论的是美国地址 -
首先。
(加利福尼亚州或 CA),检查下一个
关于你想要的包容性
(美国,美国,美国)。
就效率而言,首先检查一些“标准”格式可能是有意义的,就像丹建议的那样。
Assuming you're only dealing with those four fields (City Zip State Country), there are finite values for all fields except for City, and even that I guess if you have a big city list is also finite. So just split each field by comma then check against each field list.
Assuming we're talking US addresses-
that first.
(California or CA), check that next
on how encompassing you want to be
(US, United States, USA).
As far as efficiency goes, it might make sense to check a handful of 'standard' formats first, like Dan suggests.