使用 Java 从美国地址解析邮政编码
问题是如何检测字符串中相邻的 5 个数字。因此找到美国邮政编码。
旁注:我想将代码与 GWT 一起使用,因此正则表达式和第三方库存在限制。否则我只会使用net.sourceforge.jgeocoder。
The question is how do you detect 5 digits following each other in string. Ergo finding US postal code.
Side note: I'd like to use the code with GWT so there are limitations on regex and third party libraries. Otherwise I would just use net.sourceforge.jgeocoder.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果您要使用正则表达式,这应该适用于严格格式的 ZIP:
^\d{5}([-+]?\d{4})?$
但是仍然有一个问题。某些应用程序尝试将 5 位 ZIP 解释为整数,例如 Microsoft Excel。这意味着有时前面有零的邮政编码(例如新英格兰和波多黎各的邮政编码)经常会出现问题。因此,您可能还需要考虑查找 3 位数和 4 位数的值。
美国“第一个”邮政编码是 00501,是 IRS。 (也许我们不应该允许那个人验证!)当解释为整数时,它是 501。现在我们遇到了一个问题。
了解这一点很重要,因为与具有 mod 10 校验和的信用卡不同,地址不能自我验证。这意味着如果没有某种外部权威,您无法知道地址是否已正确格式化和标准化。
一旦您需要通过外部机构标准化地址,您也可以验证和确认该地址。
我应该提到我是 SmartyStreets 的创始人。我们提供基于网络的地址验证服务,您可以通过编程方式向我们提交您的地址列表我们将清理它们、标准化它们并验证它们。
If you're going to use a Regex, this should work for strictly formatted ZIPs:
^\d{5}([-+]?\d{4})?$
But there's still a problem. Some applications try to interpret 5-digit ZIPs as integers--for example Microsoft Excel. This means that sometimes ZIPs which have zeros in front, such as those in New England and Puerto Rico, oftentimes have problems. As such, you may also want to consider looking for 3-digit and 4-digit values.
The "first" ZIP Code in the USA is 00501 and is the IRS. (Perhaps we shouldn't allow that one to verify!) When interpreted as an integer, it's 501. Now we've got a problem.
This is important to know because, unlike credit cards which have a mod 10 checksum, addresses are not self validating. This means that you can't know if an address is formatted and standardized properly without some kind of external authority.
And once you've gone as far as needing to standardize an address via an external authority, you can have the address verified and confirmed as well.
I should mention that I'm the founder of SmartyStreets. We have a web-based address verification service where you can submit your addresses to us in a list of programmatically and we'll clean them up, standardize them, and verify them.
\\d{5}
作为正则表达式,我相信这将是一个起点代码:
从我的手机完成,所以请原谅拼写和语法
\\d{5}
as a regex I believe will be a starting pointCode:
Done from my mobile so forgive spelling and syntax
对我有用的是:
What worked for me is:
用正则表达式表达很简单:“^\d{5}”
,看一下java中如何实现正则表达式映射:http://www.regular-expressions.info/java.html
It's very simple to express in regular expression: "^\d{5}"
Just have a look on how to implement regular expression mapping in java: http://www.regular-expressions.info/java.html
用正则表达式。
由于邮政编码应该位于地址的末尾
With a regular expression.
Since a zip should be at the end of an address
在美国,邮政编码有两种形式:5 位数字(称为邮政编码)和 9 位数字(称为 zip +4)。这是解析任何有效的美国邮政编码的算法:
假设:起始点是一个包含候选邮政编码(或 zip+4)的字符串。
修改为仅限 5 位数字的 zip:
There are two forms of Zip in the U.S.A. A 5 digit number (called zip code) and a 9 digit number (called a zip +4). Here is an algorythm to to parse any valid U.S. zip code:
Assumption: The starting point is a String containing a zip code (or zip+4) candidate.
Modified for 5 digit only zip:
以下是我从地址字符串中解析邮政编码并将其与邮政编码数组进行比较的方法。地址字符串的格式为:
1234 Easy St,城市,州 55555,美国。它还可以处理 55555-5555 拉链
Here's what I did to parse a zipcode from an address string and compare it to an array of zipcodes. The format of the address string is:
1234 Easy St, City, State 55555, USA. It will also handle zips 55555-5555