从街道地址中删除街道号码
使用 Ruby (newb) 和正则表达式,我尝试从街道地址解析街道号码。 我在简单的问题上没有遇到麻烦,但我需要一些帮助:
“6223 1/2 S Figueroa ST”==> 'S Figueroa ST'
感谢您的帮助!
更新:
'6223 1/2 2ND ST'==> “2ND ST”
来自@pesto “贝克街 221B 号”==> '贝克街'
Using Ruby (newb) and Regex, I'm trying to parse the street number from the street address. I'm not having trouble with the easy ones, but I need some help on:
'6223 1/2 S FIGUEROA ST' ==> 'S FIGUEROA ST'
Thanks for the help!!
UPDATE(s):
'6223 1/2 2ND ST' ==> '2ND ST'
and from @pesto
'221B Baker Street' ==> 'Baker Street'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
更多
发布评论
评论(7)
这将删除字符串前面的所有内容,直到遇到字母:
如果可能有“221B Baker Street”之类的内容,那么您必须使用更复杂的内容。 这应该有效:
This will strip anything at the front of the string until it hits a letter:
If it's possible to have something like "221B Baker Street", then you have to use something more complex. This should work:
组匹配:
如果您还需要考虑公寓号码:
这将处理 123A 街道名称
,只要字符串中没有其他数字,就应该去除前面的数字(和空格)。 只需捕获第一组 (.*)
Group matching:
If you need to also take into account apartment numbers:
Which would take care of 123A Street Name
That should strip the numbers at the front (and the space) so long as there are no other numbers in the string. Just capture the first group (.*)
stackoverflow 上还有另外一组答案:
解析可用街道地址、城市、州、邮政编码string
我认为谷歌/雅虎解码器方法是最好的,但取决于你谈论的频率/地址数量 - 否则所选的答案可能是最好的
There's another stackoverflow set of answers:
Parse usable Street Address, City, State, Zip from a string
I think the google/yahoo decoder approach is best, but depends on how often/many addresses you're talking about - otherwise the selected answer would probably be the best
街道名称也可以是数字吗? 例如
,
您可以处理上面的第一种情况,但第二种情况很困难。
我会按空格分割地址,跳过任何不包含字母的前导部分,然后加入其余部分。 我不了解 Ruby,但这里有一个 Perl 示例,它也突出了我的方法的问题:
Can street names be numbers as well? E.g.
or even
You could deal with the first case above, but the second is difficult.
I would split the address on spaces, skip any leading components that do not contain a letter and then join the remainder. I do not know Ruby, but here is a Perl example which also highlights the problem with my approach:
哎哟! 除非您使用标准化地址,否则单独解析地址可能会非常麻烦。 这样做的原因是,通常称为门牌号的“主号码”可以位于字符串中的各个位置,例如:
这不是一个微不足道的底线。 根据您的应用程序的需求,获得准确信息的最佳方式是利用地址验证网络服务。 有少数提供商提供此功能。
为了充分披露,我是 SmartyStreets 的创始人。 我们有一个地址验证网络服务 API,它将验证和标准化您的地址,以便确保它是真实的,并允许您获取主要/门牌号码部分。 如有疑问,我们非常欢迎您亲自与我联系。
Ouch! Parsing an address by itself can be extremely nasty unless you're working with standardized addresses. The reason for this that the "primary number" which is often called the house number can be at various locations within the string, for example:
It's not a trivial undertacking. Depending upon the needs of your application, you're best bet to get accurate information is to utilize an address verification web service. There are a handful of providers that offer this capability.
In the interest of full disclosure, I'm the founder of SmartyStreets. We have an address verification web service API that will validate and standardize your address to make sure it's real and allow you to get the primary/house number portion. You're more than welcome to contact me personally with questions.
/[^\d]+$/
也将匹配相同的内容,除非不使用捕获组。/[^\d]+$/
will also match the same thing, except without using a capture group.为了供将来参考,帮助使用正则表达式的一个很好的工具是 http://www.rubular.com/
For future reference a great tool to help with regex is http://www.rubular.com/