从街道地址中删除街道号码

发布于 2024-07-25 17:29:03 字数 250 浏览 6 评论 0原文

使用 Ruby (newb) 和正则表达式,我尝试从街道地址解析街道号码。 我在简单的问题上没有遇到麻烦,但我需要一些帮助:

“6223 1/2 S Figueroa ST”==> 'S Figueroa ST'

感谢您的帮助!

更新:

'6223 1/2 2ND ST'==> “2ND ST”

来自@pesto “贝克街 221B 号”==> '贝克街'

Using Ruby (newb) and Regex, I'm trying to parse the street number from the street address. I'm not having trouble with the easy ones, but I need some help on:

'6223 1/2 S FIGUEROA ST' ==> 'S FIGUEROA ST'

Thanks for the help!!

UPDATE(s):

'6223 1/2 2ND ST' ==> '2ND ST'

and from @pesto
'221B Baker Street' ==> 'Baker Street'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

阳光下慵懒的猫 2024-08-01 17:29:03

这将删除字符串前面的所有内容,直到遇到字母:

street_name = address.gsub(/^[^a-zA-Z]*/, '')

如果可能有“221B Baker Street”之类的内容,那么您必须使用更复杂的内容。 这应该有效:

street_name = address.gsub(/^((\d[a-zA-Z])|[^a-zA-Z])*/, '')

This will strip anything at the front of the string until it hits a letter:

street_name = address.gsub(/^[^a-zA-Z]*/, '')

If it's possible to have something like "221B Baker Street", then you have to use something more complex. This should work:

street_name = address.gsub(/^((\d[a-zA-Z])|[^a-zA-Z])*/, '')
雪花飘飘的天空 2024-08-01 17:29:03

组匹配:

.*\d\s(.*)

如果您还需要考虑公寓号码:

.*\d.*?\s(.*)

这将处理 123A 街道名称

,只要字符串中没有其他数字,就应该去除前面的数字(和空格)。 只需捕获第一组 (.*)

Group matching:

.*\d\s(.*)

If you need to also take into account apartment numbers:

.*\d.*?\s(.*)

Which would take care of 123A Street Name

That should strip the numbers at the front (and the space) so long as there are no other numbers in the string. Just capture the first group (.*)

帅哥哥的热头脑 2024-08-01 17:29:03

stackoverflow 上还有另外一组答案:
解析可用街道地址、城市、州、邮政编码string

我认为谷歌/雅虎解码器方法是最好的,但取决于你谈论的频率/地址数量 - 否则所选的答案可能是最好的

There's another stackoverflow set of answers:
Parse usable Street Address, City, State, Zip from a string

I think the google/yahoo decoder approach is best, but depends on how often/many addresses you're talking about - otherwise the selected answer would probably be the best

坐在坟头思考人生 2024-08-01 17:29:03

街道名称也可以是数字吗? 例如

1234 45TH ST

1234 45 ST

您可以处理上面的第一种情况,但第二种情况很困难。

我会按空格分割地址,跳过任何不包含字母的前导部分,然后加入其余部分。 我不了解 Ruby,但这里有一个 Perl 示例,它也突出了我的方法的问题:

#!/usr/bin/perl

use strict;
use warnings;

my @addrs = (
    '6223 1/2 S FIGUEROA ST',
    '1234 45TH ST',
    '1234 45 ST',
);

for my $addr ( @addrs ) {
    my @parts = split / /, $addr;

    while ( @parts ) {
        my $part = shift @parts;
        if ( $part =~ /[A-Z]/ ) {
            print join(' ', $part, @parts), "\n";
            last;
        }
    }
}

C:\Temp> skip
S FIGUEROA ST
45TH ST
ST

Can street names be numbers as well? E.g.

1234 45TH ST

or even

1234 45 ST

You could deal with the first case above, but the second is difficult.

I would split the address on spaces, skip any leading components that do not contain a letter and then join the remainder. I do not know Ruby, but here is a Perl example which also highlights the problem with my approach:

#!/usr/bin/perl

use strict;
use warnings;

my @addrs = (
    '6223 1/2 S FIGUEROA ST',
    '1234 45TH ST',
    '1234 45 ST',
);

for my $addr ( @addrs ) {
    my @parts = split / /, $addr;

    while ( @parts ) {
        my $part = shift @parts;
        if ( $part =~ /[A-Z]/ ) {
            print join(' ', $part, @parts), "\n";
            last;
        }
    }
}

C:\Temp> skip
S FIGUEROA ST
45TH ST
ST
老子叫无熙 2024-08-01 17:29:03

哎哟! 除非您使用标准化地址,否则单独解析地址可能会非常麻烦。 这样做的原因是,通常称为门牌号的“主号码”可以位于字符串中的各个位置,例如:

  1. RR 2 Box 15(RR 也可以是 Rural Route、HC、HCR 等)
  2. PO Box 17
  3. 12B-7A
  4. NW95E235
  5. 等。

这不是一个微不足道的底线。 根据您的应用程序的需求,获得准确信息的最佳方式是利用地址验证网络服务。 有少数提供商提供此功能。

为了充分披露,我是 SmartyStreets 的创始人。 我们有一个地址验证网络服务 API,它将验证和标准化您的地址,以便确保它是真实的,并允许您获取主要/门牌号码部分。 如有疑问,我们非常欢迎您亲自与我联系。

Ouch! Parsing an address by itself can be extremely nasty unless you're working with standardized addresses. The reason for this that the "primary number" which is often called the house number can be at various locations within the string, for example:

  1. RR 2 Box 15 (RR can also be Rural Route, HC, HCR, etc.)
  2. PO Box 17
  3. 12B-7A
  4. NW95E235
  5. etc.

It's not a trivial undertacking. Depending upon the needs of your application, you're best bet to get accurate information is to utilize an address verification web service. There are a handful of providers that offer this capability.

In the interest of full disclosure, I'm the founder of SmartyStreets. We have an address verification web service API that will validate and standardize your address to make sure it's real and allow you to get the primary/house number portion. You're more than welcome to contact me personally with questions.

痴情 2024-08-01 17:29:03

/[^\d]+$/ 也将匹配相同的内容,除非不使用捕获组。

/[^\d]+$/ will also match the same thing, except without using a capture group.

雪若未夕 2024-08-01 17:29:03

为了供将来参考,帮助使用正则表达式的一个很好的工具是 http://www.rubular.com/

For future reference a great tool to help with regex is http://www.rubular.com/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文