Pentaho ETL 和数据分析器是不错的选择吗?

发布于 2024-08-12 06:18:32 字数 209 浏览 8 评论 0原文

我一直在寻找 ETL 工具,在 google 上找到了很多关于 Pentaho Kettle 的信息。

我还需要一个在星型模式上运行的数据分析器,以便业务用户可以尝试并生成任何类型的报告或矩阵。 PentaHo 分析仪再次看起来不错。

应用程序的其他部分将用 java 开发,并且应用程序应该与数据库无关。

Pentaho 足够好还是我应该检查其他工具。

I was looking for ETL tool and on google found lot about Pentaho Kettle.

I also need a Data Analyzer to run on Star Schema so that business user can play around and generate any kind of report or matrix. Again PentaHo Analyzer is looking good.

Other part of the application will be developed in java and the application should be database agnostic.

Is Pentaho good enough or there are other tools I should check.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

泅人 2024-08-19 06:18:32

Pentaho 似乎相当可靠,提供整套 BI 工具,据报道正在改进集成。但是...希望 BI 解决方案采用开源路线的公司也很可能最终使用开源数据库技术...并且在这个意义上“与数据库无关”很容易成为一把双刃剑。例如,您可以在 Microsoft 的分析服务中开发一个多维数据集,并且确信无论您的多维数据集发送到数据库的任何 MDX/XMLA 都将被一致地解释,几乎不会出现令人讨厌的意外情况。

与 Pentaho 堆栈相比,Pentaho 堆栈通常会结束与 Postgresql 或 Mysql 的交互。我不能保证 Postgresql 在 OLAP 领域的表现,但我确实从经验中知道,Mysql 尽管其毫无疑问的优势,但在 OLAP 解决方案中通常随处可见的 SQL 类型方面存在“问题” (如果不使用GROUP BYCOUNT DISTINCT,你就无法在立方体中走得更远)。因此,您节省的许可成本的一部分几乎肯定会用于解决由于 Pentaho 并不总是知道它正在与哪个数据库通信而产生的问题 - 可以这么说,抢夺彼得(至少部分)支付保罗的费用。

Pentaho seems to be pretty solid, offering the whole suite of BI tools, with improved integration reportedly on the way. But...the chances are that companies wanting to go the open source route for their BI solution are also most likely to end up using open source database technology...and in that sense "database agnostic" can easily be a double-edged sword. For instance, you can develop a cube in Microsoft's Analysis Services in the comfortable knowledge that whatver MDX/XMLA your cube sends to the database will be intrepeted consistently, holding very little in the way of nasty surprises.

Compare that to the Pentaho stack, which will typically end interacting with Postgresql or Mysql. I can't vouch for how Postgresql performs in the OLAP realm, but I do know from experience that Mysql - for all its undoubted strengths - has "issues" with the types of SQL that typically crops up all over the place in an OLAP solution (you can't get far in a cube without using GROUP BY or COUNT DISTINCT). So part of what you save in licence costs will almost certainly be used to solve issues arising from the fact the Pentaho doesn't always know which database it is talking to - robbing Peter to (at least partially) pay Paul, so to speak.

在你怀里撒娇 2024-08-19 06:18:32

不幸的是,需要更多信息。例如:

  • 您是否需要与知名应用程序(Oracle Financials、Remedy 等)交换数据?如果是这样,您可以节省大量时间和精力。使用内置支持该接口的 ETL 解决方案可以节省金钱。
  • 您需要讨论哪些数据库产品(和版本)和文件类型?
  • 是否需要支持Web服务的查询?
  • 您需要近乎实时的数据流吗?
  • 您是否需要规则级审核?计算每一行的计数
  • 是否需要增量处理?
  • 您需要在什么类型的机器上运行它? linux?视窗?大型机?
  • 该工具必须遵守什么样的版本控制、测试和构建流程?
  • 什么样的表现&您需要可扩展性吗?
  • 您介意数据库最终推动转变吗?
  • 你需要它在用户空间中运行吗?
  • 您是否需要在与其余部分断开连接的各种网络上运行它的一部分? (对于提取过程来说并不罕见)
  • 您需要支持多少个接口以及什么复杂性?

您可能会花费大量时间部署和学习 ETL 工具 - 却发现它确实不能很好地满足您的需求。你最好先花几个小时来弄清楚这一点。

Unfortunately, more info is needed. For example:

  • will you need to exchange data with well-known apps (Oracle Financials, Remedy, etc)? If so, you can save a ton of time & money with an ETL solution that has support for that interface already built-in.
  • what database products (and versions) and file types do you need to talk to?
  • do you need to support querying of web-services?
  • do you need near real-time trickling of data?
  • do you need rule-level auditing & counts for accounting for every single row
  • do you need delta processing?
  • what kinds of machines do you need this to run on? linux? windows? mainframe?
  • what kind of version control, testing and build processes will this tool have to comply with?
  • what kind of performance & scalability do you need?
  • do you mind if the database ends up driving the transformations?
  • do you need this to run in userspace?
  • do you need to run parts of it on various networks disconnected from the rest? (not uncommon for extract processes)
  • how many interfaces and of what complexity do you need to support?

You can spend a lot of time deploying and learning an ETL tool - only to discover that it really doesn't meet your needs very well. You're best off taking a couple of hours to figure that out first.

浅浅 2024-08-19 06:18:32

我之前使用过 Talend 并取得了一些成功。您可以通过在图形设计器中将操作链接在一起来创建翻译。肯定有一些WTF,并且很难处理多行记录,但除此之外它工作得很好。

Talend 还生成 Java,您可以远程访问 ETL 流程。该工具也是免费的,尽管他们提供企业培训和支持。

I've used Talend before with some success. You create your translation by chaining operations together in a graphical designer. There were definitely some WTF's and it was difficult to deal with multi-line records, but it worked well otherwise.

Talend also generates Java and you can access the ETL processes remotely. The tool is also free, although they provide enterprise training and support.

真心难拥有 2024-08-19 06:18:32

有很多选择。如果您想要免费工具,请考虑 BIRT、Talend 和 Pentaho。如果您想要更强大的功能,请查看 Tableau 和 BIRT Analytics。

There are lots of choices. Look at BIRT, Talend and Pentaho, if you want free tools. If you want much more robustness, look at Tableau and BIRT Analytics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文