当前位置：文江博客话题详情

字符串中的街道地址搜索 - Python 或 Ruby

发布于 2024-10-09 12:49:33 字数 137 浏览 8 评论 0原文

嘿，我想知道如何在 Python/Ruby 中的字符串中找到街道地址？

也许通过正则表达式？

另外，它会采用以下格式（美国）

420 Fanboy Lane, Cupertino CA

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆蝶 2024-10-16 12:49:33

也许您想看看 pypostal。 pypostal 是 libpostal 的官方 Python 绑定。

根据 Mike Bethany 的示例，我制作了这个小示例：

from postal.parser import parse_address

addresses = [
    "420 Fanboy Lane, Cupertino CA 12345",
    "1829 William Tell Oveture, by Gioachino Rossini 88421",
    "114801 Western East Avenue Apt. B32, Funky Township CA 12345",
    "1 Infinite Loop, Cupertino CA 12345-1234",
    "420 time!",
]

for address in addresses:
    print parse_address(address)
    print "*" * 60

>     [(u'420', u'house_number'), (u'fanboy lane', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1829', u'house_number'), (u'william tell', u'road'), (u'oveture by gioachino', u'house'), (u'rossini', u'road'), (u'88421',
> u'postcode')]
>     ************************************************************
>     [(u'114801', u'house_number'), (u'western east avenue apt.', u'road'), (u'b32', u'postcode'), (u'funky', u'road'), (u'township',
> u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1', u'house_number'), (u'infinite loop', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345-1234',
> u'postcode')]
>     ************************************************************
>     [(u'420', u'house_number'), (u'time !', u'house')]
>     ************************************************************

Maybe you want to have a look at pypostal. pypostal are the official Python bindings to libpostal.

With the Examples from Mike Bethany i made this little Example:

from postal.parser import parse_address

addresses = [
    "420 Fanboy Lane, Cupertino CA 12345",
    "1829 William Tell Oveture, by Gioachino Rossini 88421",
    "114801 Western East Avenue Apt. B32, Funky Township CA 12345",
    "1 Infinite Loop, Cupertino CA 12345-1234",
    "420 time!",
]

for address in addresses:
    print parse_address(address)
    print "*" * 60

>     [(u'420', u'house_number'), (u'fanboy lane', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1829', u'house_number'), (u'william tell', u'road'), (u'oveture by gioachino', u'house'), (u'rossini', u'road'), (u'88421',
> u'postcode')]
>     ************************************************************
>     [(u'114801', u'house_number'), (u'western east avenue apt.', u'road'), (u'b32', u'postcode'), (u'funky', u'road'), (u'township',
> u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1', u'house_number'), (u'infinite loop', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345-1234',
> u'postcode')]
>     ************************************************************
>     [(u'420', u'house_number'), (u'time !', u'house')]
>     ************************************************************

回复收藏 0 原文

梦断已成空 2024-10-16 12:49:33

使用您的示例，这就是我在 Ruby 中提出的内容（我对其进行了编辑以包含邮政编码和可选的 +4 邮政编码）：

regex = Regexp.new(/^[0-9]* (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?$/)
addresses = ["420 Fanboy Lane, Cupertino CA 12345"]
addresses << "1829 William Tell Oveture, by Gioachino Rossini 88421"
addresses << "114801 Western East Avenue Apt. B32, Funky Township CA 12345"
addresses << "1 Infinite Loop, Cupertino CA 12345-1234"
addresses << "420 time!"

addresses.each do |address|
  print address
  if address.match(regex)
    puts " is an address"
  else
    puts " is not an address"
  end
end

# Outputs:
> 420 Fanboy Lane, Cupertino CA 12345 is an address  
> 1829 William Tell Oveture, by Gioachino Rossini 88421 is not an address  
> 114801 Western East Avenue Apt. B32, Funky Township CA 12345 is an address  
> 1 Infinite Loop, Cupertino CA 12345-1234 is an address  
> 420 time! is not an address

Using your example this is what I came up with in Ruby (I edited it to include ZIP code and an optional +4 ZIP):

regex = Regexp.new(/^[0-9]* (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?$/)
addresses = ["420 Fanboy Lane, Cupertino CA 12345"]
addresses << "1829 William Tell Oveture, by Gioachino Rossini 88421"
addresses << "114801 Western East Avenue Apt. B32, Funky Township CA 12345"
addresses << "1 Infinite Loop, Cupertino CA 12345-1234"
addresses << "420 time!"

addresses.each do |address|
  print address
  if address.match(regex)
    puts " is an address"
  else
    puts " is not an address"
  end
end

# Outputs:
> 420 Fanboy Lane, Cupertino CA 12345 is an address  
> 1829 William Tell Oveture, by Gioachino Rossini 88421 is not an address  
> 114801 Western East Avenue Apt. B32, Funky Township CA 12345 is an address  
> 1 Infinite Loop, Cupertino CA 12345-1234 is an address  
> 420 time! is not an address

回复收藏 0 原文

我的痛♀有谁懂 2024-10-16 12:49:33

这是我使用的：

(\d{1,10}( \w+){1,10}( ( \w+){1,10})?( \w+){1,10}[,.](( \w+){1,10}(,)? [A-Z]{2}( [0-9]{5})?)?)

它并不完美，也不匹配边缘情况，但它适用于大多数常规输入的地址和部分地址。

它在文本中查找地址，例如

嗨！我的地址为 12567 Some St. Fairfax, VA。来找我吧！
some text 12567 Some St. is my home
其他 123 My Street Drive, Fairfax VA 22033

希望这对某人有帮助

Here's what I used:

(\d{1,10}( \w+){1,10}( ( \w+){1,10})?( \w+){1,10}[,.](( \w+){1,10}(,)? [A-Z]{2}( [0-9]{5})?)?)

It's not perfect and doesn't match edge cases but it works for most regularly typed addresses and partial addresses.

It finds addresses in text such as

Hi! I'm at 12567 Some St. Fairfax, VA. Come get me!
some text 12567 Some St. is my home
something else 123 My Street Drive, Fairfax VA 22033

Hope this helps someone

回复收藏 0 原文

北方。的韩爷 2024-10-16 12:49:33

\d{1,4}( \w+){1,3},( \w+){1,3} [A-Z]{2}

尚未完全测试，但应该可以工作。只需将其与 re 中您最喜欢的函数一起使用即可（例如 re.findall）。假设：

门牌号的长度可以在 1 到 4 位数字之间，
门牌号后面有 1-3 个单词，并且全部用空格分隔
城市名称为 1-3 个单词（需要匹配 Cupertino、Los Angeles 和 San Luis Obispo）

\d{1,4}( \w+){1,3},( \w+){1,3} [A-Z]{2}

Not fully tested, but should work. Just use it with your favorite function from re (e.g. re.findall. Assumptions:

A house number can be between 1 and 4 digits long
1-3 words follow a house number, and they're all separated by spaces
City name is 1-3 words (needs to match Cupertino, Los Angeles, and San Luis Obispo)

回复收藏 0 原文

披肩女神 2024-10-16 12:49:33

好的，基于 Mike Bethany 和 Rafe Kettler 非常有帮助的回复（谢谢！）
我知道这个 REGEX 适用于 python 和 ruby。
/[0-9]{1,4} (.), (.) [a-zA-Z]{2} [0-9]{5}/

Ruby 代码 -结果位于 12 Argonaut Lane, Lexington MA 02478

myregex=Regexp.new(/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/)

print "We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!".match(myregex)

Python 代码 - 工作方式不太一样，但这是基本代码。

import re
myregex = re.compile(r'/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/')
search = myregex.findall("We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!")

Okay, Based on the very helpful Mike Bethany and Rafe Kettler responses ( thanks!)
I get this REGEX works for python and ruby.
/[0-9]{1,4} (.), (.) [a-zA-Z]{2} [0-9]{5}/

Ruby Code - Results in 12 Argonaut Lane, Lexington MA 02478

myregex=Regexp.new(/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/)

print "We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!".match(myregex)

Python Code - doesnt work quite the same, but this is the base code.

import re
myregex = re.compile(r'/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/')
search = myregex.findall("We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!")

回复收藏 0 原文