从自由格式文本中提取国际街道地址/电话号码
嘿伙计。 我正在寻找一些正则表达式来帮助从自由格式文本(la Gmail)中获取街道地址和电话号码。
给定一些文字:“约翰,我今天去了商店,太棒了!你听说他们搬到了 500 Green St. 了吗?...有机会的话请给我打电话 +14252425424。 “
我希望能够提取:
500 Green St.
(识别为街道地址)
+14252425424
(识别为电话号码)
让这个问题变得更容易的是我不关心解析被拉出的文本。 也就是说,我不在乎 Green
是道路名称还是 425
是区号。 我只想获取“看起来像”地址或电话号码的字符串。
不幸的是,这需要尽可能在国际上发挥作用。
有人有任何线索吗? 谢谢!
Hey, folks. I'm looking for some regular expressions to help grab street addresses and phone numbers from free-form text (a la Gmail).
Given some text: "John, I went to the store today, and it was awesome! Did you hear that they moved to 500 Green St.? ... Give me a call at +14252425424 when you get a chance."
I'd like to be able to pull out:
500 Green St.
(recognized as a street address)
+14252425424
(recognized as a phone number)
What makes this problem easier is that I don't care about parsing text that gets pulled out. That is, I don't care that Green
is the name of the road or that 425
is the area code. I just want to grab strings that "look like" addresses or telephone numbers.
Unfortunately, this needs to work internationally, as best as possible.
Anyone have any leads? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
电话号码只要你有所有国家代码和数字格式的列表就很容易,街道地址我不知道,我能给你的唯一建议是验证每组单词@addressdoctor.com
Phone numbers as long as you have a list of all country codes and number formats is easy, street addresses I have no idea, the only advice I can give you is to validate each set of words @ addressdoctor.com
您可以尝试一下 RecogniContact (-> address-parser.com),它可以识别邮政地址和电话数字。
You can give RecogniContact (-> address-parser.com) a try, it recognizes both postal addresses and phone numbers.
请参阅深入了解 Python 的第 7 章。 它涉及电话号码和街道地址。 我相信您可以以此为起点。 国际部分似乎很难。 我建议您构建初稿,在多个语言环境中进行尝试,进行迭代和改进。
Take a look at Chapter 7 of Dive Into Python. It touches both phone numbers and street addresses. I believe you can use this as a starting point. The international part seems tough. I suggest you build a first draft, try it on several locales, iterate and improve.