使用正则表达式解析地址
我必须创建一个循环,并使用正则表达式 填充 4 个变量中的任何一个
$address, $street, $town, $lot
循环将被输入一个可能包含信息的字符串 就像下面的行
'123 any street, mytown'
或'Lot 4 another road, thattown'
或'Lot 2 96 other road, hertown' 或
'this ave, thistown'
或'yourtown'
因为逗号后面的任何内容都是 $town
我认为
(.*), (.*)
这是第一个捕获可以用 (Lot \d*) (.*), (.*)
检查 如果第一个捕获以数字开头,则为地址(如果单词带有空格,则为 $street
) 如果有一个词,那就是$town
I have to create a loop, and with a regexp
populate any of the 4 variables
$address, $street, $town, $lot
The loop will be fed a string that may have info in it
like the lines below
'123 any street, mytown'
or'Lot 4 another road, thattown'
or'Lot 2 96 other road, her town'
or'this ave, this town'
or'yourtown'
since anything after a comma is the $town
I thought
(.*), (.*)
then the first capture could be checked with (Lot \d*) (.*), (.*)
if the 1st capture starts with a number, then its the address (if word with white space its $street
)
if one word, its just the $town
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
看看 Geo::StreetAddress::US 如果这些是美国地址。
即使不是,该模块的源代码也应该让您了解解析自由形式街道地址所涉及的内容。
这是一个处理您发布的地址的脚本(更新,早期版本将批次和编号合并为一个字符串):
输出:
Take a look at Geo::StreetAddress::US if these are U.S. addresses.
Even if they are not, the source of this module should give you an idea of what is involved in parsing free form street addresses.
Here is a script that handles the addresses you posted (updated, earlier version combined lot and number into one string):
Output:
我建议您不要尝试在单个正则表达式中完成所有这些操作,因为很难验证其正确性。
首先,我会在逗号处分开。逗号后面的内容就是$town,如果没有逗号,则整个字符串就是$town。
然后我会检查是否有任何批次信息并从字符串中提取它。
然后我会查找街道/大道号码和名称。
分而治之:)
I'd suggest you don't try to do all of this in a single regexp as it will be hard to verify its correctness.
First, I'd split at the comma. Whatever comes after the comma is the $town, and if there is no comma, the whole string is the $town.
Then I'd check if there is any lot information and extract it from the string.
Then I'd look for street/avenue number and name.
Divide and conquer :)
这应该分为 3 部分 - 如何区分地址/街道?
这是您的示例的细分
如果我理解正确的话,这也将地址/街道分开
This should separate into 3 parts - how do you distinguish the address/street?
here is the breakdown for your examples
If I understand correctly, this one separates the address/street as well
我无法匹配最后一个,但对于前 3 个,您可以使用如下所示的内容:
这是测试正则表达式:
您可以在 regexbuddy 中使用它来测试: 链接
I can't match the last one but for the first 3 ones you can use something like this:
this is the testing regex:
You can use this in regexbuddy to test: link
Geo::StreetAddress::US 对于简单的地址来说很好,但对于更困难的示例可能会丢失上下文。它将解析街道名称,直到找到郊区。因此,“46 7th St. Johns Park”、“St.”消耗得太快,街道类型被错误地分配给“公园”,“CA”的 stae 成为郊区。
我开发了一个 Perl 模块,可以识别许多更困难的模式 https://metacpan.org/发布/Lingua-EN-AddressParse 。它可以识别“The Parade”、nth Street 等惯用语,以及“46 Broad Street #12”等子属性地址等。
Geo::StreetAddress::US is fine for simple addresses, but it can lose context on harder examples. It will parse street names up until it finds a suburb. So with " 46 7th St. Johns Park", 'St.' is consumed too soon, street type get incorrectly assigned to 'Park' and the stae of 'CA' becomes the suburb.
I have developed a Perl module that can identify many of these more difficult patterns https://metacpan.org/release/Lingua-EN-AddressParse . It recognizes idioms such as 'The Parade", nth Street, sub property addresses such as "46 Broad Street #12" and many more.