商店街道地址和防止重复
我有一个通过 Django 访问的数据库Python。我们希望根据地址(而不是名称,因为有些建筑物根本没有名称)来存储建筑物。
我们需要防止用户在同一建筑物的数据库中输入重复的条目。由于人们输入地址的方式(例如“1000 Main Street”与“1000 Main St.”),
我们可以通过什么方式可靠地防止重复?我正在使用 MySQL 数据库。
谢谢
I have a database that I am accessing through Django & Python. We want to store buildings based on their addresses (not names, since some buildings simply don't have names).
We need to prevent users from entering duplicate entries into our database for the same building. This is made difficult by the way people could type in the addresses (eg. "1000 Main Street" vs. "1000 Main St.")
In what way can we reliably prevent duplicates? I am using a MySQL database.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您仅与美国合作,则可以使用 USPS 地址标准化 Web 服务来解决重复问题:
http://www.usps.com/webtools/address.htm
If you're working only with the U.S., you can use the USPS Address Standardization web service to resolve duplicates:
http://www.usps.com/webtools/address.htm
地址重复删除是一项复杂的任务。虽然 USPS 网络服务还不错,但它严重缺乏一些重要功能。另外,使用常规 Web 服务、执行请求等来执行批量重复数据删除的效率非常低。
而且,USPS 似乎已经更新了他们的网站,因此 Dan 发布的链接虽然有用,但现在已损坏。
作为更新的答案,我想指出我为 SmartyStreets 工作,我们从地址列表中删除重复项。例如,您可以将列表上传到 CASS 认证擦洗,然后地址将被标准化并标记为重复。这样真的很简单。如果您需要输入点验证,请查看 LiveAddress,它提供的信息比单独的 USPS 服务更重要。
Address de-duplication is a complicated task. While the USPS web service is alright, it's seriously lacking in some important features. Plus, it's quite inefficient to perform batch de-duplication using a regular web service, performing requests, etc.
And, it appears the USPS has updated their site, so the link Dan posted, while useful, is now broken.
As an updated answer, I'd like to point out that I work for SmartyStreets and we remove duplicates from address lists. You could, for example, upload your list to CASS-Certified Scrubbing and the addresses will be standardized and flagged for duplicates. It's really easy this way. If you need point-of-entry validation, take a look at LiveAddress, which provides more important information than the USPS service alone does.