判断字符串是否是街道地址、套房号、购物中心或其他内容
我正在使用 javascript 来解析一些数据,但遇到了一些麻烦。
我有一个包含 1-3 行数据的字段。
通常只有一行,代表街道地址:
1234 Hollywood St.
但有时是这样的:
Beverly Hills Shopping Center 1234 Hollywood St.
其他时候是这样的:
1234 Hollywood St Ste 12
有时它是这样的:
1234 Hollywood St 2nd Floor (between Hollywood St and Tom Cruise Ave)
我真的很想知道哪一行是街道地址。目前,我正在尝试确定哪一行是“地址第 2 行”,即套房号、楼层号等...我并不真正需要地址、第 2 行,但通过排除过程,这会有所帮助给我街道地址。
是否有一个不错的工具可用,例如正则表达式函数或可以告诉我字符串是否可能是街道地址的东西?
或者我可以用另一种方式来处理这个问题吗?
谢谢!
编辑:
此算法不需要 100%。我正在准备将地址发送到谷歌地图 API 进行验证。我可以尝试地址的每一行,看看哪一行是有效的,但这会增加对谷歌的调用次数,并带来很小但有限的误报机会。
我希望能够在通过谷歌验证之前稍微清理一下数据,以减少错误和更多调用的必要性。
I'm using javascript to parse through some data and have run into a bit of a pickle.
I have a field that is 1-3 lines of data.
Usually it is only one line, representing a street address:
1234 Hollywood St.
But sometimes it is something like this:
Beverly Hills Shopping Center 1234 Hollywood St.
Other times it is this:
1234 Hollywood St Ste 12
And other times its stuff like this:
1234 Hollywood St 2nd Floor (between Hollywood St and Tom Cruise Ave)
I'd really like to know which line is the street address. Currently, I'm trying to identify which line is the "Address line 2", meaning the Suite#, Floor number, etc... I don't really need the address, line 2, but by process of elimination, this helps get me the street address.
Is there a nice tool available, like a regex function or something that will tell me if a string is likely a street address?
Or is there another way that I could be handling this?
Thanks!
Edit:
This algorithm does not need to be 100%. I'm preparing the address to be sent to google maps API to be verified. I could try each line of the address to see which one is valid but this would increase the number of calls to google and carry a small, but finite chance of a false positive.
I'd like to be able to scrub the data a little before verifying through google to decrease errors and the necessity for more calls.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
正如另一个答案所述,这是地址验证服务的工作。请注意,Google 地图 API 不是地址验证服务——最好将其描述为功能非常强大的地址近似服务(存在显着差异)。
地址验证意味着地址当前是真实的,这意味着它对应于实际位置。它通常意味着地址是可交付的(取决于业务需求)。
我是地址验证公司 SmartyStreets 的软件开发人员。我们提供了一个批处理工具,我认为它非常适合您的使用案例。由于我们的系统最多接受两条街道地址输入行,因此我建议为每个具有 2 条以上街道地址行的地址生成一些排列。它的速度也非常快(不到一个小时就处理了 100 万个地址),并且不需要我们进行任何交互,因为它是一项在线服务。
另一个好消息是,您甚至可能不需要将地址发送到 google 地图 API,因为它们已经是 送货点已验证。但这取决于您的具体需求。
更新:SmartyStreets 现在提供国际地址验证。
As stated in another answer, this is a job for an address verification service. Please note that the Google maps API is not an address verification service--it would be best described as a very capable address approximation service (there's a notable difference).
Address verification implies that an address is real at the present time, meaning that it corresponds to an actual location. It often implies that an address is deliverable (depending on the business need).
I'm a software developer at SmartyStreets, an address verification company. We provide a batch processing tool that I think is a good fit for your use case. Since our system accepts up to two input lines for the streets address, I suggest generating a few permutations for each address that has more than 2 street address lines. It is also very fast (1 million addresses are processed in less than an hour) and doesn't require any interaction from us because it's an online service.
The other bit of good news is that you may not even need to send the address to the google maps API because they will already be Delivery-Point Validated. But that will depend on your exact needs.
Update: SmartyStreets now provides international address verification.
有可用的 Web 服务,您可以传递地址,它将返回已解析地址的格式良好的 json/xml 对象。也许类似的事情会对你有帮助?就像一些评论所说的那样。您无法简单地使用 javascript 来完成此操作
。这是我个人研究过使用的一项服务。您需要熟悉 API
https://webgis.usc.edu /Services/AddressNormalization/WebService/DeterministicNormalizationWebService.aspx
There are webservices available that you can pass an address and it will return a well formed json/xml object of the parsed address. Perhaps something like that will help you? Like some of the comments state. You won't be able to do this simply with javascript
Here is one service I have personally looked into using. You'll need to get familiar with the APIs
https://webgis.usc.edu/Services/AddressNormalization/WebService/DeterministicNormalizationWebService.aspx
首先看一下以下 USPS 官方缩写
街道后缀缩写
辅助单位指示符
然后您就会知道您期望输入什么,但您还必须考虑所有可能的非官方变体/标点符号等...有很多事情要做...
一般来说,街道地址行应该以一个数字后跟一个空格(将其与 2 楼等分开)、一个或多个单词,最后是街道后缀缩写。对于城市、州、zip 元组,您必须混合完整的州名称及其缩写(包括简短的变体,如 N York 或 N.York 或 N. York),并记住 zip5 和 zip5+4 情况。
First of all have a look at the following official USPS abbreviations
Street Suffix Abbreviations
Secondary Unit Designators
Then you will have an idea of what you will expect as input, but you also have to take in place all possible unofficial variations/punctuation etc.... A lot of things to do...
In general a street address line should start with a number followed by a space (separates it from 2nd floor etc), one or more words, and finally a street suffix abbreviation. For the city, state, zip tuple again you have to mix full state names and their abbeviations (including short variations like N York or N.York or N. York) and remember the zip5 and zip5+4 cases.