您如何实施数据质量和数据质量?数据仓库中的验证规则?
我正在开发一个数据仓库,作为我公司企业应用程序套件的一部分。所以我一直在学习很多关于DW概念的知识,但是规则引擎似乎很困难,而且我找不到很多关于各种实现方法的信息。规则的重点是验证数据质量,并在达到某些业务指标时发出警报(例如,当月销售额为 xx.xx 美元)。
我们的应用程序需要为每个客户进行定制,所以我想规则通用。实现规则引擎有哪些方法?
- 准备好工具了吗? (我将重新分发,所以这通常效果不佳)
- 框架/API
- 设计模式
- 用于创建我们自己的其他想法的
谢谢。
I'm developing a datawarehouse to be part of my company's enterprise application suite. So I've been learning a lot about DW concepts but the rules engine seems difficult and I can't find much information about various ways to implement. The focus of the rules is to validate data quality, and also alert when certain business metrics are reached ($xx.xx in sales for the month, for example)
Our app needs to be customizable for each client, so I would like to make the rules generic. What are some ways to implement a rules engine?
- ready made tools? (I will be redistributing so this usually doesn't work well)
- Frameworks/APIs
- Design Paterns for creating our own
- Other Ideas
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将其视为让几个独立的系统一起工作可能会有所帮助,而不是一个“大引擎”负责一切。
当谈到“业务指标”时,请查看KPI(关键绩效指标)。分析引擎 (MS-SSAS, Pentaho-Mondrian 等...)允许KPI 的简单定义和呈现(仪表板)。如果您自己开发,您可能仍然了解这个概念。
数据质量主要由“操作系统”负责,这是收集数据的地方。如果垃圾到达DW,那就太晚了。使用数据分析工具了解源数据的外观就像——必须从源头强制执行数据质量。
在DW加载过程中,您可以使用逐步的ECD(Extract、Clean、Conform、Deliver)方法来实现某些“数据标准”。 ETL 工具 (MS-SSIS, Pentaho-Kettle,Oracle Data Itegrator 等)旨在帮忙解决这个问题。
关于规则引擎,请查看inrule, ILOG, Fico,Corticon,Jboss Drools 等。这些是“独立系统”,在与操作系统编排时可用于强制执行业务规则。执行业务规则通常会提高数据质量。您可以下载 Drools 并很快开始修补,其他供应商也允许一些免费下载。
It may help to look at this as having a few separate systems working together -- as opposed of one "big engine" being responsible for everything.
When it comes to "business metrics", look at KPIs (key performance indicators). Analytic engines (MS-SSAS, Pentaho-Mondrian, etc...) allow for for simple definition and presentation (dashboards) of KPIs. If developing your own, you may still get an idea of the concept.
Data quality is mostly responsibility of "operational systems", that's where data is collected. If garbage reaches DW, it's too late. Use data profiling tools to get an idea of how source data look like -- data quality has to be enforced at the source.
During the DW loading process, you can use step-by-step ECCD (Extract, Clean, Conform, Deliver) approach to implement certain "data standards". ETL tools (MS-SSIS, Pentaho-Kettle, Oracle Data Itegrator, etc..) are designed to help with this.
Regarding rule engines, look at inrule, ILOG, Fico, Corticon, Jboss Drools, etc. These are "independent systems", and can be used to enforce business rules when orchestrated with operational systems. Enforcing business rules usually leads to increased data quality. You can download Drools and start tinkering fairly quickly, other vendors allow some freebie downloads too.