有什么软件可以进行数据质量检查
我正在寻找一些可能的软件选项,这些选项将允许自定义规则来操作批量数据文件(.csv),例如,正确的大写(允许州保留大写和唯一的姓氏),识别特定单词的字数一个字段和一些其他自定义规则。任何指导将不胜感激。
I'm looking to identify some possible software options that will allow for custom rules to manipulate bulk data files (.csv) For example, proper capitalization (allowing for states to remain capital and unique surnames), identifying the word count of specific words in a field, and some other custom rules. Any guidance would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 Talend Open Studio 来完成此任务。它是一个用于数据操作和集成的开源 ETL 工具。例如,您可以 ImportCSV >>数据库>>执行转换>>导出CSV。可能性是无限的。
您可以在这里找到它:http://www.talend.com /products-data-integration/talend-open-studio.php
听起来您可能希望创建数据配置文件。为此,您可以使用 Talend Open Profiler,他们最近添加了对平面文件(例如 .csv)的支持。它使用简单,30 分钟内即可启动并运行。
您可以在此处找到下载:http://www.talend。 com/products-data-quality/talend-open-profiler.php
您可以在这里找到一些教程:http://www.talendforge.org/tutorials/menu.php
在教程上选择“数据质量”选项卡,并向下滚动直到“Talend Open Profiler”
这是我评估新数据集的数据质量的第一步。
You could use Talend Open Studio for this task. It is an Opensource ETL tool for data manipulation and integration. You can for example ImportCSV >> DATABASE >> perform transformations >> ExportCSV. The possibilities are endless.
You can find it here: http://www.talend.com/products-data-integration/talend-open-studio.php
It also sounds like you might be looking to create a profile of the data. For this you can use Talend Open Profiler, they recently added support for flat files such as your .csv. It is simple to use and you should be up and running in 30 mins.
You can find the download here: http://www.talend.com/products-data-quality/talend-open-profiler.php
You can find some tutorials here:http://www.talendforge.org/tutorials/menu.php
On the tutorials choose the Data Quality tab, and scroll down until 'Talend Open Profiler'
It is my first step in assessing data quality on a new dataset.
快速谷歌“数据清理实用程序”出现了这个:
http://data-scrubbing.qarchive.org/
它们看起来非常接近您正在寻找的内容。
这实际上取决于规则的复杂程度。比简单的东西复杂得多,并且您可能只需编写一些代码(或对其进行编码)即可领先。
A quick google "data scrubbing utilities" turned up this:
http://data-scrubbing.qarchive.org/
They look to be very close to what you're looking for.
It'll really depend on how complex the rules get. Much more complex than simple stuff, and you'd probably be ahead by just coding something up (or having it coded).