Pentaho ETL 和数据分析器是不错的选择吗?
我一直在寻找 ETL 工具,在 google 上找到了很多关于 Pentaho Kettle 的信息。
我还需要一个在星型模式上运行的数据分析器,以便业务用户可以尝试并生成任何类型的报告或矩阵。 PentaHo 分析仪再次看起来不错。
应用程序的其他部分将用 java 开发,并且应用程序应该与数据库无关。
Pentaho 足够好还是我应该检查其他工具。
I was looking for ETL tool and on google found lot about Pentaho Kettle.
I also need a Data Analyzer to run on Star Schema so that business user can play around and generate any kind of report or matrix. Again PentaHo Analyzer is looking good.
Other part of the application will be developed in java and the application should be database agnostic.
Is Pentaho good enough or there are other tools I should check.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Pentaho 似乎相当可靠,提供整套 BI 工具,据报道正在改进集成。但是...希望 BI 解决方案采用开源路线的公司也很可能最终使用开源数据库技术...并且在这个意义上“与数据库无关”很容易成为一把双刃剑。例如,您可以在 Microsoft 的分析服务中开发一个多维数据集,并且确信无论您的多维数据集发送到数据库的任何 MDX/XMLA 都将被一致地解释,几乎不会出现令人讨厌的意外情况。
与 Pentaho 堆栈相比,Pentaho 堆栈通常会结束与 Postgresql 或 Mysql 的交互。我不能保证 Postgresql 在 OLAP 领域的表现,但我确实从经验中知道,Mysql 尽管其毫无疑问的优势,但在 OLAP 解决方案中通常随处可见的 SQL 类型方面存在“问题” (如果不使用
GROUP BY
或COUNT DISTINCT
,你就无法在立方体中走得更远)。因此,您节省的许可成本的一部分几乎肯定会用于解决由于 Pentaho 并不总是知道它正在与哪个数据库通信而产生的问题 - 可以这么说,抢夺彼得(至少部分)支付保罗的费用。Pentaho seems to be pretty solid, offering the whole suite of BI tools, with improved integration reportedly on the way. But...the chances are that companies wanting to go the open source route for their BI solution are also most likely to end up using open source database technology...and in that sense "database agnostic" can easily be a double-edged sword. For instance, you can develop a cube in Microsoft's Analysis Services in the comfortable knowledge that whatver MDX/XMLA your cube sends to the database will be intrepeted consistently, holding very little in the way of nasty surprises.
Compare that to the Pentaho stack, which will typically end interacting with Postgresql or Mysql. I can't vouch for how Postgresql performs in the OLAP realm, but I do know from experience that Mysql - for all its undoubted strengths - has "issues" with the types of SQL that typically crops up all over the place in an OLAP solution (you can't get far in a cube without using
GROUP BY
orCOUNT DISTINCT
). So part of what you save in licence costs will almost certainly be used to solve issues arising from the fact the Pentaho doesn't always know which database it is talking to - robbing Peter to (at least partially) pay Paul, so to speak.不幸的是,需要更多信息。例如:
您可能会花费大量时间部署和学习 ETL 工具 - 却发现它确实不能很好地满足您的需求。你最好先花几个小时来弄清楚这一点。
Unfortunately, more info is needed. For example:
You can spend a lot of time deploying and learning an ETL tool - only to discover that it really doesn't meet your needs very well. You're best off taking a couple of hours to figure that out first.
我之前使用过 Talend 并取得了一些成功。您可以通过在图形设计器中将操作链接在一起来创建翻译。肯定有一些WTF,并且很难处理多行记录,但除此之外它工作得很好。
Talend 还生成 Java,您可以远程访问 ETL 流程。该工具也是免费的,尽管他们提供企业培训和支持。
I've used Talend before with some success. You create your translation by chaining operations together in a graphical designer. There were definitely some WTF's and it was difficult to deal with multi-line records, but it worked well otherwise.
Talend also generates Java and you can access the ETL processes remotely. The tool is also free, although they provide enterprise training and support.
有很多选择。如果您想要免费工具,请考虑 BIRT、Talend 和 Pentaho。如果您想要更强大的功能,请查看 Tableau 和 BIRT Analytics。
There are lots of choices. Look at BIRT, Talend and Pentaho, if you want free tools. If you want much more robustness, look at Tableau and BIRT Analytics.