使用 Java 从美国地址解析邮政编码

发布于 2024-11-28 11:58:00 字数 202 浏览 3 评论 0原文

问题是如何检测字符串中相邻的 5 个数字。因此找到美国邮政编码。

旁注:我想将代码与 GWT 一起使用,因此正则表达式和第三方库存在限制。否则我只会使用net.sourceforge.jgeocoder

The question is how do you detect 5 digits following each other in string. Ergo finding US postal code.

Side note: I'd like to use the code with GWT so there are limitations on regex and third party libraries. Otherwise I would just use net.sourceforge.jgeocoder.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

苦笑流年记忆 2024-12-05 11:58:00

如果您要使用正则表达式,这应该适用于严格格式的 ZIP:
^\d{5}([-+]?\d{4})?$

  • 12345
  • 123456789
  • 12345-6789
  • 12345+6789
  • 12345-67ND (是的,你没有看错,有时最后两个可以是ND)

但是仍然有一个问题。某些应用程序尝试将 5 位 ZIP 解释为整数,例如 Microsoft Excel。这意味着有时前面有零的邮政编码(例如新英格兰和波多黎各的邮政编码)经常会出现问题。因此,您可能还需要考虑查找 3 位数和 4 位数的值。

美国“第一个”邮政编码是 00501,是 IRS。 (也许我们不应该允许那个人验证!)当解释为整数时,它是 501。现在我们遇到了一个问题。

了解这一点很重要,因为与具有 mod 10 校验和的信用卡不同,地址不能自我验证。这意味着如果没有某种外部权威,您无法知道地址是否已正确格式化和标准化。

一旦您需要通过外部机构标准化地址,您也可以验证和确认该地址。

我应该提到我是 SmartyStreets 的创始人。我们提供基于网络的地址验证服务,您可以通过编程方式向我们提交您的地址列表我们将清理它们、标准化它们并验证它们。

If you're going to use a Regex, this should work for strictly formatted ZIPs:
^\d{5}([-+]?\d{4})?$

  • 12345
  • 123456789
  • 12345-6789
  • 12345+6789
  • 12345-67ND (yes, you read that right, sometimes the last two can be ND)

But there's still a problem. Some applications try to interpret 5-digit ZIPs as integers--for example Microsoft Excel. This means that sometimes ZIPs which have zeros in front, such as those in New England and Puerto Rico, oftentimes have problems. As such, you may also want to consider looking for 3-digit and 4-digit values.

The "first" ZIP Code in the USA is 00501 and is the IRS. (Perhaps we shouldn't allow that one to verify!) When interpreted as an integer, it's 501. Now we've got a problem.

This is important to know because, unlike credit cards which have a mod 10 checksum, addresses are not self validating. This means that you can't know if an address is formatted and standardized properly without some kind of external authority.

And once you've gone as far as needing to standardize an address via an external authority, you can have the address verified and confirmed as well.

I should mention that I'm the founder of SmartyStreets. We have a web-based address verification service where you can submit your addresses to us in a list of programmatically and we'll clean them up, standardize them, and verify them.

烂柯人 2024-12-05 11:58:00

\\d{5} 作为正则表达式,我相信这将是一个起点

代码:

String[] tokens = string.split("\\d{5}");  
// check token length.

从我的手机完成,所以请原谅拼写和语法

\\d{5} as a regex I believe will be a starting point

Code:

String[] tokens = string.split("\\d{5}");  
// check token length.

Done from my mobile so forgive spelling and syntax

不知所踪 2024-12-05 11:58:00

对我有用的是:

(\d{5}(?=\s|$))|(\d{5}-\d{4}(?=\s|$))

What worked for me is:

(\d{5}(?=\s|$))|(\d{5}-\d{4}(?=\s|$))
迷离° 2024-12-05 11:58:00

用正则表达式表达很简单:“^\d{5}”

,看一下java中如何实现正则表达式映射:http://www.regular-expressions.info/java.html

It's very simple to express in regular expression: "^\d{5}"

Just have a look on how to implement regular expression mapping in java: http://www.regular-expressions.info/java.html

挖个坑埋了你 2024-12-05 11:58:00

用正则表达式。

\d{5}

由于邮政编码应该位于地址的末尾

\d{5}$

With a regular expression.

\d{5}

Since a zip should be at the end of an address

\d{5}$
白况 2024-12-05 11:58:00

在美国,邮政编码有两种形式:5 位数字(称为邮政编码)和 9 位数字(称为 zip +4)。这是解析任何有效的美国邮政编码的算法:
假设:起始点是一个包含候选邮政编码(或 zip+4)的字符串。

  1. 迭代输入字符串并将所有数字提取到第二个字符串,我将其称为“zipString”。注意:zip +4 通常写为“12345-1234”。这将删除破折号。对于您的目的来说,这可能过于接受,因为如果输入字符串是“1a2b3c4d--------5x”,它也可以工作。这种松散性通常对我来说很好,因为它忽略了简单且可忽略的输入错误(例如“1 2345”作为邮政编码)。
  2. 如果“zipString”的长度为 5 个字符,则它就是邮政编码。
  3. 如果“zipString”的长度为 9 个字符,则前 5 个字符是邮政编码,最后 4 个字符是 zip +4 的 +4 部分。
  4. 如果“zipString”长度既不是 5 也不是 9 个字符,则输入无效。

修改为仅限 5 位数字的 zip:

  1. 迭代输入字符串并将所有数字提取到第二个字符串,我将其称为“zipString”。与正则表达式相比,我更喜欢它,因为它会忽略简单且可忽略的输入错误(例如邮政编码“1 2345”)。
  2. 如果“zipString”的长度为 5 个字符,则它就是邮政编码。
  3. 如果“zipString”长度不足 5 个字符,则输入无效。

There are two forms of Zip in the U.S.A. A 5 digit number (called zip code) and a 9 digit number (called a zip +4). Here is an algorythm to to parse any valid U.S. zip code:
Assumption: The starting point is a String containing a zip code (or zip+4) candidate.

  1. Iterate through the input string and extract all digits to a second string that I will call the "zipString". Note: zip +4 is often written "12345-1234". This will remove the dash. This may be overly accepting for your purposes because it will also work if the input string is "1a2b3c4d-------5x". This looseness is generally fine for me because it ignores simple and ignorable input errors (like "1 2345" as the zip code).
  2. If the "zipString" is 5 characters long, that is the zip code.
  3. If the "zipString" is 9 characters long, the first 5 characters are the zip code and the last 4 characters are the +4 portion of a zip +4.
  4. If the "zipString" is neither 5 nor 9 characters long, the input is not valid.

Modified for 5 digit only zip:

  1. Iterate through the input string and extract all digits to a second string that I will call the "zipString". I prefer this to regular expressions because it ignores simple and ignorable input errors (like "1 2345" as the zip code).
  2. If the "zipString" is 5 characters long, that is the zip code.
  3. If the "zipString" is not 5 characters long, the input is not valid.
过度放纵 2024-12-05 11:58:00

以下是我从地址字符串中解析邮政编码并将其与邮政编码数组进行比较的方法。地址字符串的格式为:
1234 Easy St,城市,州 55555,美国。它还可以处理 55555-5555 拉链

private static final Pattern pattern = Pattern.compile("\\d{5}(?:[-\\s]\\d{4})?");
private static int []zipcodes = {<your array of zips>};

public static boolean isInServiceArea(String address) {

    Matcher matcher = pattern.matcher(address);
    int zipcode = 0;
    if (matcher.find()) {
        zipcode = Integer.parseInt(matcher.group(0));
        Log.d(TAG, "zipcode: " + zipcode);
    }

    for (int code : zipcodes) {
        if (code == zipcode) {
            return true;
        }
    }
    return false;
}

Here's what I did to parse a zipcode from an address string and compare it to an array of zipcodes. The format of the address string is:
1234 Easy St, City, State 55555, USA. It will also handle zips 55555-5555

private static final Pattern pattern = Pattern.compile("\\d{5}(?:[-\\s]\\d{4})?");
private static int []zipcodes = {<your array of zips>};

public static boolean isInServiceArea(String address) {

    Matcher matcher = pattern.matcher(address);
    int zipcode = 0;
    if (matcher.find()) {
        zipcode = Integer.parseInt(matcher.group(0));
        Log.d(TAG, "zipcode: " + zipcode);
    }

    for (int code : zipcodes) {
        if (code == zipcode) {
            return true;
        }
    }
    return false;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文