用户输入解析 - 城市/州/邮政编码/国家

发布于 2024-07-25 17:15:25 字数 265 浏览 4 评论 0 原文

我正在寻找有关解析城市/州/邮政编码/国家/地区多种组合的用户输入的建议。

一个常见的例子是谷歌地图的功能。

输入的一些示例是:

  • “城市,州,国家”
  • “城市,国家”
  • “城市,邮政编码,国家”
  • “城市,州,邮政编码”
  • “邮政编码”

解析此内容的有效且正确的方法是什么来自用户的输入?

如果您知道任何示例实现,请分享:)

I'm looking for advice on parsing input from a user in multiple combinations of City / State / Zip Code / Country.

A common example would be what Google maps does.

Some examples of input would be:

  • "City, State, Country"
  • "City, Country"
  • "City, Zip Code, Country"
  • "City, State, Zip Code"
  • "Zip Code"

What would be an efficient and correct way to parse this input from a user?

If you are aware of any example implementations please share :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

阳光下的泡沫是彩色的 2024-08-01 17:15:25

第一步是使用空格或逗号作为分隔字符将文本分解为单独的标记。 为了可扩展性,您可以将每个令牌交给线程或服务器(如果使用类似 Map-Reducer 的架构)来找出每个令牌是什么。 例如,

  • 如果模式中有数字,那么它可能是邮政编码。
  • 该项目是否在已知状态列表中?
  • 国家也很容易处理,就像州一样,数量有限。
  • 与书写地址的常见方式相比,这些标记的顺序是什么? 大多数输入可能会遵循当地邮局的地址格式习惯。

获得单独的令牌结果后,您可以将各个部分重新粘合在一起以获得完整的地址。 在存在问题的情况下,您可以提示用户他们的真正含义(例如 Google 地图)并将该信息添加到学习列表中。

假设您不尝试构建地图系统,向应用程序添加该支持的最简单方法是查询 Google 或 Yahoo 并要求他们为您解析日期。

The first step would be to break up the text into individual tokens using spaces or commas as the delimiting characters. For scalability, you can then hand each token to a thread or server (if using a Map-Reducer like architecture) to figure out what each token is. For instance,

  • If we have numbers in the pattern, then it's probably a zip code.
  • Is the item in the list of known states?
  • Countries are also fairly easy to handle like states, there's a limited number.
  • What order are the tokens in compared to the common ways of writing an address? Most input will probably follow the local post office custom for address formats.

Once you have the individual token results, you can glue the parts back together to get a full address. In the cases where there are questions, you can prompt the user what they really meant (like Google maps) and add that information to a learned list.

The easiest method to add that support to an applications, assuming you're not trying to build a map system, is to query Google or Yahoo and ask them to parse the date for you.

枯叶蝶 2024-08-01 17:15:25

我本人对谷歌的处理方式非常着迷。 我不记得在其他地方见过类似的事情。

我相信,您尝试使用各种分隔符(空格、逗号、分号等)用单词分隔输入字符串。然后您就会有几种组合。 对于每个组合,您将每个单词与国家、城市、城镇、邮政编码数据库进行匹配。 然后,您定义一些关于如何评估每个组合的小组匹配结果的指标。 这里还应该有交叉规则,例如,如果邮政编码不太匹配,但国家、城市、城镇匹配良好,并且组合起来引用有效地址,则该指标会产生高分。

这确实很困难,而且不是晚上的代码练习。 它还需要强大的计算资源 - 共享主机可能会在 10 个请求下崩溃,但数据中心可以很好地满足它。

不确定是否有示例实现。 许多地理服务是有偿提供的。 像谷歌地图这样复杂的东西可能要花一大笔钱。

如我错了请纠正我。

I am myself very fascinated with how Google handles that. I do not remember seeing anything similar anywhere else.

I believe, you try to separate an input string in words trying various delimeters - space, comma, semicolon etc. Then you have several combinations. For each combination, you take each words and match it against country, city, town, postal code database. Then you define some metric on how to evaluate the group match result for each combination. Here should also be cross rules, like if the postal code does not match well, but country, city, town match well and in combination refer to a valid address then the metric yields a high mark.

It is sure difficult and not an evening code exercise. It also requires strong computational resources - a shared hosting would probably crack under just 10 requests, but a data center could serve it well.

Not sure if there is an example implementation. Many geographical services are offered on paid basis. Something that sophisticated as GoogleMaps would likely cost a fortune.

Correct me if I'm wrong.

满身野味 2024-08-01 17:15:25

我找到了一个简单的 PHP 实现

雅虎似乎有一个提供功能(某种程度)的网络服务

Openstreetmap 似乎在其主页上提供了相同的搜索功能

I found a simple PHP implementation

Yahoo seems to have a webservice that offers the functionality (sort of)

Openstreetmap seems to offer the same search functionality on its homepage

假装不在乎 2024-08-01 17:15:25

假设您只处理这四个字段(城市、邮政编码、州、国家/地区),则除城市之外的所有字段都有有限值,即使我猜您是否有一个大城市列表也是有限的。 因此,只需用逗号分隔每个字段,然后检查每个字段列表即可。

假设我们谈论的是美国地址 -

  • 邮政编码是最明显的,因此请检查
    首先。
  • 州有 50x2 个选项
    (加利福尼亚州或 CA),检查下一个
  • 国家/地区是否有 ~190x2 个选项,具体取决于
    关于你想要的包容性
    (美国,美国,美国)。
  • 剩下的可能就是你的城市。

就效率而言,首先检查一些“标准”格式可能是有意义的,就像丹建议的那样。

Assuming you're only dealing with those four fields (City Zip State Country), there are finite values for all fields except for City, and even that I guess if you have a big city list is also finite. So just split each field by comma then check against each field list.

Assuming we're talking US addresses-

  • Zip is most obvious, so check for
    that first.
  • State has 50x2 options
    (California or CA), check that next
  • Country has ~190x2 options, depending
    on how encompassing you want to be
    (US, United States, USA).
  • Whatever is left over is probably your City.

As far as efficiency goes, it might make sense to check a handful of 'standard' formats first, like Dan suggests.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文