字符串中的街道地址搜索 - Python 或 Ruby

发布于 2024-10-09 12:49:33 字数 137 浏览 8 评论 0原文

嘿, 我想知道如何在 Python/Ruby 中的字符串中找到街道地址?

也许通过正则表达式?

另外,它会采用以下格式(美国)

420 Fanboy Lane, Cupertino CA

谢谢!

Hey,
I was wondering how I can find a Street Address in a string in Python/Ruby?

Perhaps by a regex?

Also, it's gonna be in the following format (US)

420 Fanboy Lane, Cupertino CA

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

逆蝶 2024-10-16 12:49:33

也许您想看看 pypostal。 pypostal 是 libpostal 的官方 Python 绑定。

根据 Mike Bethany 的示例,我制作了这个小示例:

from postal.parser import parse_address

addresses = [
    "420 Fanboy Lane, Cupertino CA 12345",
    "1829 William Tell Oveture, by Gioachino Rossini 88421",
    "114801 Western East Avenue Apt. B32, Funky Township CA 12345",
    "1 Infinite Loop, Cupertino CA 12345-1234",
    "420 time!",
]

for address in addresses:
    print parse_address(address)
    print "*" * 60

>     [(u'420', u'house_number'), (u'fanboy lane', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1829', u'house_number'), (u'william tell', u'road'), (u'oveture by gioachino', u'house'), (u'rossini', u'road'), (u'88421',
> u'postcode')]
>     ************************************************************
>     [(u'114801', u'house_number'), (u'western east avenue apt.', u'road'), (u'b32', u'postcode'), (u'funky', u'road'), (u'township',
> u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1', u'house_number'), (u'infinite loop', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345-1234',
> u'postcode')]
>     ************************************************************
>     [(u'420', u'house_number'), (u'time !', u'house')]
>     ************************************************************

Maybe you want to have a look at pypostal. pypostal are the official Python bindings to libpostal.

With the Examples from Mike Bethany i made this little Example:

from postal.parser import parse_address

addresses = [
    "420 Fanboy Lane, Cupertino CA 12345",
    "1829 William Tell Oveture, by Gioachino Rossini 88421",
    "114801 Western East Avenue Apt. B32, Funky Township CA 12345",
    "1 Infinite Loop, Cupertino CA 12345-1234",
    "420 time!",
]

for address in addresses:
    print parse_address(address)
    print "*" * 60

>     [(u'420', u'house_number'), (u'fanboy lane', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1829', u'house_number'), (u'william tell', u'road'), (u'oveture by gioachino', u'house'), (u'rossini', u'road'), (u'88421',
> u'postcode')]
>     ************************************************************
>     [(u'114801', u'house_number'), (u'western east avenue apt.', u'road'), (u'b32', u'postcode'), (u'funky', u'road'), (u'township',
> u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1', u'house_number'), (u'infinite loop', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345-1234',
> u'postcode')]
>     ************************************************************
>     [(u'420', u'house_number'), (u'time !', u'house')]
>     ************************************************************
梦断已成空 2024-10-16 12:49:33

使用您的示例,这就是我在 Ruby 中提出的内容(我对其进行了编辑以包含邮政编码和可选的 +4 邮政编码):

regex = Regexp.new(/^[0-9]* (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?$/)
addresses = ["420 Fanboy Lane, Cupertino CA 12345"]
addresses << "1829 William Tell Oveture, by Gioachino Rossini 88421"
addresses << "114801 Western East Avenue Apt. B32, Funky Township CA 12345"
addresses << "1 Infinite Loop, Cupertino CA 12345-1234"
addresses << "420 time!"

addresses.each do |address|
  print address
  if address.match(regex)
    puts " is an address"
  else
    puts " is not an address"
  end
end

# Outputs:
> 420 Fanboy Lane, Cupertino CA 12345 is an address  
> 1829 William Tell Oveture, by Gioachino Rossini 88421 is not an address  
> 114801 Western East Avenue Apt. B32, Funky Township CA 12345 is an address  
> 1 Infinite Loop, Cupertino CA 12345-1234 is an address  
> 420 time! is not an address  

Using your example this is what I came up with in Ruby (I edited it to include ZIP code and an optional +4 ZIP):

regex = Regexp.new(/^[0-9]* (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?$/)
addresses = ["420 Fanboy Lane, Cupertino CA 12345"]
addresses << "1829 William Tell Oveture, by Gioachino Rossini 88421"
addresses << "114801 Western East Avenue Apt. B32, Funky Township CA 12345"
addresses << "1 Infinite Loop, Cupertino CA 12345-1234"
addresses << "420 time!"

addresses.each do |address|
  print address
  if address.match(regex)
    puts " is an address"
  else
    puts " is not an address"
  end
end

# Outputs:
> 420 Fanboy Lane, Cupertino CA 12345 is an address  
> 1829 William Tell Oveture, by Gioachino Rossini 88421 is not an address  
> 114801 Western East Avenue Apt. B32, Funky Township CA 12345 is an address  
> 1 Infinite Loop, Cupertino CA 12345-1234 is an address  
> 420 time! is not an address  
我的痛♀有谁懂 2024-10-16 12:49:33

这是我使用的:

(\d{1,10}( \w+){1,10}( ( \w+){1,10})?( \w+){1,10}[,.](( \w+){1,10}(,)? [A-Z]{2}( [0-9]{5})?)?) 

它并不完美,也不匹配边缘情况,但它适用于大多数常规输入的地址和部分地址。

它在文本中查找地址,例如

嗨!我的地址为 12567 Some St. Fairfax, VA。来找我吧!

some text 12567 Some St. is my home

其他 123 My Street Drive, Fairfax VA 22033

希望这对某人有帮助

Here's what I used:

(\d{1,10}( \w+){1,10}( ( \w+){1,10})?( \w+){1,10}[,.](( \w+){1,10}(,)? [A-Z]{2}( [0-9]{5})?)?) 

It's not perfect and doesn't match edge cases but it works for most regularly typed addresses and partial addresses.

It finds addresses in text such as

Hi! I'm at 12567 Some St. Fairfax, VA. Come get me!

some text 12567 Some St. is my home

something else 123 My Street Drive, Fairfax VA 22033

Hope this helps someone

北方。的韩爷 2024-10-16 12:49:33
\d{1,4}( \w+){1,3},( \w+){1,3} [A-Z]{2}

尚未完全测试,但应该可以工作。只需将其与 re 中您最喜欢的函数一起使用即可(例如 re.findall)。假设:

  1. 门牌号的长度可以在 1 到 4 位数字之间,
  2. 门牌号后面有 1-3 个单词,并且全部用空格分隔
  3. 城市名称为 1-3 个单词(需要匹配 Cupertino、Los Angeles 和 San Luis Obispo)
\d{1,4}( \w+){1,3},( \w+){1,3} [A-Z]{2}

Not fully tested, but should work. Just use it with your favorite function from re (e.g. re.findall. Assumptions:

  1. A house number can be between 1 and 4 digits long
  2. 1-3 words follow a house number, and they're all separated by spaces
  3. City name is 1-3 words (needs to match Cupertino, Los Angeles, and San Luis Obispo)
披肩女神 2024-10-16 12:49:33

好的,基于 Mike Bethany 和 Rafe Kettler 非常有帮助的回复(谢谢!)
我知道这个 REGEX 适用于 python 和 ruby​​。
/[0-9]{1,4} (.), (.) [a-zA-Z]{2} [0-9]{5}/

Ruby 代码 -结果位于 12 Argonaut Lane, Lexington MA 02478

myregex=Regexp.new(/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/)

print "We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!".match(myregex)

Python 代码 - 工作方式不太一样,但这是基本代码。

import re
myregex = re.compile(r'/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/')
search = myregex.findall("We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!")

Okay, Based on the very helpful Mike Bethany and Rafe Kettler responses ( thanks!)
I get this REGEX works for python and ruby.
/[0-9]{1,4} (.), (.) [a-zA-Z]{2} [0-9]{5}/

Ruby Code - Results in 12 Argonaut Lane, Lexington MA 02478

myregex=Regexp.new(/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/)

print "We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!".match(myregex)

Python Code - doesnt work quite the same, but this is the base code.

import re
myregex = re.compile(r'/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/')
search = myregex.findall("We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!")
那小子欠揍 2024-10-16 12:49:33

如前所述,地址的格式非常自由。与 REGEX 方法相比,提供准确、标准化地址数据的服务怎么样?我在 SmartyStreets 工作,我们提供了一个 API 来完成这件事。只需一个简单的 GET 请求,您就可以解析您的地址。尝试这个 python 示例(您需要开始试用):

https://github.com/smartystreets/smartystreets-python-sdk/blob/主/示例/us_street_single_address_example.py

As stated, addresses are very free-form. Rather than the REGEX approach how about a service that provides accurate, standardized address data? I work for SmartyStreets, where we provide an API that does this very thing. One simple GET request and you've got your address parsed for you. Try this python sample out (you'll need to start a trial):

https://github.com/smartystreets/smartystreets-python-sdk/blob/master/examples/us_street_single_address_example.py

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文