Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 10 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
我建议将 Google Refine 作为开源项目(新的 BSD 许可证)用于解析和修复粗糙数据的工具。它还允许重复数据的集群和协调,以及数据挖掘功能。
我已经使用它成功导入和修复了各种格式(.csv、.tsv、.xls、.xml、.json、.rdf 等)的大量数据。它可以在内部使用,而无需向外部发送任何数据,这似乎是问题“用于匹配姓名/地址数据的工具”
注意。 Google Refine 以前称为 Freebase Gridworks。
I'd recommend Google Refine as an open source (New BSD license) tool for parsing and fixing crufty data. It also allows clustering and reconciling of duplicate data, as well as having data-mining features.
I've used it to import and fix a lot of data in various formats, .csv, .tsv, .xls, .xml, .json, .rdf etc. with success. It can be used in-house without sending any data externally, which seemed to be a concern of the question "tools for matching name/address data"
NB. Google Refine was previously called Freebase Gridworks.
我偶然发现了以下文章:“合并/清除和重复检测”。
通过查看http://www.semaphorecorp.com,我发现了一些极低的价格。
这不是我想要的,但至少是一点帮助,并且是朝着正确方向迈出的一步。
I stumble upon the following article: "Merge/Purge and Duplicate Detection".
By looking at http://www.semaphorecorp.com I found some extremely low prices.
This is not what I'm looking for, but at least is a bit of help, and a step on the right direction.
在 sourceforge 上尝试 OSDQ 开源数据质量和分析项目
Try OSDQ open source data quality and profiling project on sourceforge