Pentaho 与 Microsoft BI Stack
我的公司在 MS BI Stack(SQL Server 报告服务、分析服务和集成服务)上投入了大量资金,但我想看看看似最受关注的开源替代品 Pentaho 是什么样的。
我已经安装了一个版本,并且非常轻松地安装并运行了它。 所以这样很好。 但我还没有时间开始在实际工作中使用它来彻底了解该包。
你们中有人了解 Pentaho 与 MS BI 的优缺点吗?或者有任何此类比较的链接吗?
非常感激!
My company is heavily invested in the MS BI Stack (SQL Server Reporting Services, -Analysis Services and -Integration Services), but I want to have a look at what the seemingly most talked about open-source alternative Pentaho is like.
I've installed a version, and I got it up and running quite painlessly. So that's good. But I haven't really the time to start using it for actual work to get a thorough understanding of the package.
Have any of you got any insights into what are the pros and cons of Pentaho vs MS BI, or any links to such comparisons?
Much appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
在摆脱 Business Objects 的过程中,我审查了多个 Bi 堆栈。 我的很多评论都是偏好。 两种工具集都非常出色。 有些事情是我更喜欢巧克力软糖布朗尼冰淇淋而不是纯巧克力。
Pentaho 有一些非常聪明的人与他们合作,但微软一直走在一条资金充足、计划周密的道路上。 请记住,微软仍然是数据库市场上的弱势者。 甲骨文是这里的王。 为了保持竞争力,微软在购买数据库时赠送了很多好东西,并且被迫多次重新发明他们的平台。 我知道这与数据库无关,但数据库之战导致 MS 放弃了很多,以便为他们的堆栈增加价值。
1.) 平台
SQL Server 不在 Unix 或 Linux 上运行,因此它们自动被排除在这个市场之外。 现在 Windows 的价格与某些版本或 Unix 大致相同。 Windows 相当便宜,而且现在运行良好。 它给我带来的麻烦和 Linux 一样多。
2.) 联机分析处理
分析服务于 2005 年(当前为 2008 年)在 2000 年版本的基础上进行了彻底改造。 它的威力比 2000 强一个数量级。五角星(蒙德里安)一旦变大,就不再那么快了。 它也有一些功能。 相当不错,但工具较少。 两者都支持 Excel 作为必不可少的平台。 MS 版本更加强大。
3.) ETL
MS - DTS 已被 SSIS 取代。 再次,速度、力量和能力的数量级增加。 它控制所有数据移动或程序控制。 如果它不能做到这一点,您可以在 Powershell 中编写脚本。 与 2008 版本中的 Informatica 相当。
Pentaho - 比以前好多了。 虽然没有我想要的那么快,但我几乎可以做我想做的一切。
4.) 仪表板
Pentaho 对此进行了改进。 开发起来有点不舒服和不友好,但对于 MS 来说确实没有真正的等价物。
5.) 报告
MS 报告确实很强大,但使用起来并不难。 我现在喜欢它,但一开始讨厌它,直到我对它有了更好的了解。 我一直在使用水晶报表,MS 报表生成器功能更强大。 在 MS 中做困难的事情很容易,但做简单的事情就有点难了。
Pentaho 有点笨拙。 我一点也不喜欢它,但你可能会喜欢。 我发现它过于复杂。 我希望它更像 Crystal 报表生成器或 MS 报表生成器,但它就像 jasper 一样。 我发现很难。 这可能是一种偏好。
6.) 特别
MS - 这对我来说是真正的赢家。 我和我的用户一起测试了它,他们立即爱上了 MS 用户报告生成器。 与众不同之处在于它不仅易于使用,而且高效。
Pentaho - 很好,但相当老派。 它使用更典型的基于向导的模型,并具有强大的工具,但我讨厌它。 就其本身而言,它是一个出色的工具,但我们已经放弃了这种风格,没有人想回去。 我在使用 logiXML 时也遇到了同样的问题。 该界面按原来的样子运行良好,但与我们使用 12 年的界面相比并没有太大变化。
http://wiki.pentaho.com/display/PRESALESPORTAL/Methods+of +交互式+报告
有一些经验丰富的人可以使 Pentaho 真正运行良好,我只是发现 MS 套件效率更高。
I reviewed multiple Bi stacks while on a path to get off of Business Objects. A lot of my comments are preference. Both tool sets are excellent. Some things are how I prefer chocolate fudge brownie ice cream over plain chocolate.
Pentaho has some really smart guys working with them but Microsoft has been on a well funded and well planned path. Keep in mind MS are still the underdogs in the database market. Oracle is king here. To be competitive MS has been giving away a lot of goodies when you buy the database and have been forced to reinvent their platform a couple of times. I know this is not about the database, but the DB battle has cause MS to give away a lot in order to add value to their stack.
1.) Platform
SQL server doesn't run on Unix or Linux so they are automatically excluded from this market. Windows is about the same price as some versions or Unix now. Windows is pretty cheap and runs faily well now. It gives me about as much trouble as Linux.
2.) OLAP
Analysis services was reinvented in 2005 (current is 2008) over the 2000 version. It is an order of magnatude more powerful over 2000. The pentaho (Mondrian) is not as fast once you get big. It also has few features. It is pretty good but there are less in the way of tools. Both support Excel as the platform which is esscential. The MS version is more robust.
3.) ETL
MS - DTS has been replaced with SSIS. Again, order of magnatude increase in speed, power, and ability. It controls any and all data movement or program control. If it can't do it you can write a script in Powershell. On par with Informatica in the 2008 release.
Pentaho - Much better than is used to be. Not as fast as I would like but I can do just about everything I want to do.
4.) dashboard
Pentaho has improved this. It is sort of uncomfortable and unfriendly to develop but there is really not a real equiv for MS.
5.) reports
MS reports is really powerful but not all that hard to use. I like it now but hated it at first, until I got to know it a little better. I had been using crystal reports and the MS report builder is much more powerful. It is easy to do hard things in MS, but a little harder to do easy things.
Pentaho is a little clumsy. I didn't like it at all but you might. I found it to be overly complex. I wish it was either more like the Crystal report builder or the MS report builder but it is jasper like. I find is to be hard. That may be a preference.
6.) ad hoc
MS - this was the real winner for me. I tested it with my users an they instantly in love with the MS user report builder. What made the difference was how it was not just easy to use, but also productive.
Pentaho - is good but pretty old school. It uses the more typical wizard based model and has powerful tools but I hate it. It is an excellent tool for what it is, but we have moved on from this style and no one wants to go back. Same problem I had with logiXML. The interface worked well for what it was but is not really much of a change from what we used 12 years.
http://wiki.pentaho.com/display/PRESALESPORTAL/Methods+of+Interactive+Reporting
There are some experienced people out there that can make Pentaho really run well, I just found the MS suite to be more productive.
警告——有许多网站列出了 SSIS 的众多缺陷、错误和烦恼。 不知道为什么 SSIS 在这篇文章中名列前茅——但在你把你的项目押在它上面之前,看看人们在博客圈里怎么说。 根据我的经验,关于 SSIS 的使用有多糟糕,大约有 20:1 的抱怨——我也同意,目前正在寻找任何替代方案。
Warning -- there are numerous sites out there listing the numerous deficiencies, bugs, and annoyances with SSIS. Not sure why SSIS came out on top with the post -- but before you bet your project on it, look at what people have to say in the blogosphere. From my experience its about 20:1 ranting about how horrible SSIS is to work with--I can concur as well, currently looking for any alternative.
这里有很棒的信息吗? 我还没有尝试过 Pentaho,但计划去看看。 我是一位经验丰富的 MS BI 顾问,从 1998 年开始就使用它。SSIS 非常快且非常强大,但批评也很到位。 我发现 SSIS 存在以下问题:
(1) 很难调试,您会遇到神秘的错误,这些错误可能不会给您任何关于问题的真正原因和位置的提示。
(2) 根据之前的评论,这是有史以来最糟糕的开发环境! 我不知道他们在想什么。
(a) 创建一个包含 100 或更多列的表,并对其进行合并联接。 现在返回并尝试对合并连接进行更新(例如拉出一个新列)。 在您单击合并连接上的“确定”以保存更改后,即使在最快的计算机上,也可能需要几分钟的时间。 我有一个巨大的数据流,其中包含大量宽记录和许多合并连接。 向数据流添加一列需要半天多的时间。 我更新了合并连接,然后必须去做其他事情,并在 5-10 分钟后回来查看是否已完成。 微软对此的回应是将你的包分成多个包,将数据放在它们之间的表或二进制文件中。 好吧,如果您要在所有步骤之间使用磁盘,那么您可能会用 SQL 完成整个事情! ETL 工具的主要目的之一是将所有这些内容保存在内存中并避免磁盘 I/O。
(b) 设计器有时会彻底崩溃,丢失自上次保存以来的所有工作(因为这个,我现在在睡觉时按 ctrl-S)
(c) 我必须找出一个 hack 并在 Excel 中生成 SSIS 包 XML 以获取广泛记录。 我有一个医疗保健客户,其中 600 多条列记录很常见。 如果您尝试在 SSIS 中定义包含 600 列的文件格式,则必须一次键入每一列! 即使 MS Access 也允许您将电子表格中的布局剪切并粘贴到文件布局中,但 SSIS 却不允许。 因此,我必须从布局生成 XML 并将 XML 代码粘贴到包中的正确位置。 这样做的方法很丑陋,但它节省了整整几天的工作时间并避免了很多错误。
(d) 与 (c) 类似,如果您需要修剪所有列,并且有 600 多个列,您猜怎么着? 在派生列组件中,您必须输入 trim(column1) 600 多次! 我现在在 SQL 查询中执行所有像这样的简单转换来获取数据,因为可以轻松地从 Excel 工作表生成数据。
(e) 有很多奇怪的东西,组件变得不可见,有时你打开包装,所有组件都完全不连贯地重新排列。
(f) FTP 功能可能是 ETL 中最常用的功能之一,但它很弱,并且只支持没人使用的普通 FTP。 如今每个人都使用 SFTP、FTPS、https 等...因此几乎每个实现都需要使用包必须调用的第 3 方推荐行驱动的文件传输应用程序。
(g) 尝试 CYA,类似于 Windows Vista 中可笑的安全性,Microsoft 使将 SSIS 包从一种环境实际推广到另一种环境变得极其困难。 它默认采用“使用用户密钥加密敏感信息”安全性这一愚蠢的事情,这意味着它必须在您将其移动到的环境中与您开发它的环境中的同一帐户下运行,但这种情况很少见。 有更好的配置方法,但它总是尝试恢复到这种完全无用的安全保护。
(h) 最后,大多数这些问题现在都在第三版中,这清楚地表明微软没有计划修复它们。
(i) 调试并不像其他语言那么容易。
SSIS 仍然有很多好处,但也有一些严重的痛苦。
Great information here? I have not tried Pentaho but and planning on checking it out. I am a seasoned MS BI consultant, using it since 1998. SSIS is very fast and very powerful but the criticisms are spot on. I found the following issues with SSIS:
(1) It is hard to debug, you get cryptic errors that may not give you any hint about what and where the problem really is.
(2) Per a prior comment, it is the shittiest development environment ever! I have no clue what they are thinking.
(a) Create a table with a 100 or more columns and put a merge join on it. Now go back in and try to make an update to the merge join (like pull a new column through). It can take several minutes, even on the fastest machine after you click ok on the merge join to save your change. I have a huge dataflow with lots of wide records and many merge joins. Adding one column to the dataflow takes more than half a day. I update a merge join and then have to go do something else and check back 5-10 minutes later to see if it is done. Microsoft's response to this is to break up your package into multiple packages, place the data in a table or binary between them. Well if you are going to disk between all the steps, you may was well do the whole thing in SQL! One of the main purposes of an ETL tool is to all this stuff in memory and avoid disk I/O.
(b) The designer outright crashes sometimes, losing all your work since last save (I do ctrl-S in my sleep now because of this)
(c) I had to figure out a hack and generate SSIS package XML in Excel for wide records. I have a Healthcare client where 600+ column records are commonplace. If you try to define a file format with 600 columns in SSIS, you have to type every single column in one at a time!!! Even MS access allows you to cut and paste a layout from a spreadsheet into a file layout, but not SSIS. So I had to generate the XML from the layout and paste the XML code into the right place in the package. Ugly way to do it but it saved entire days of work and lots of errors.
(d) Similar to (c), if you need to trim all your columns and you have say 600+ of them, guess what? In the derived column component, you have to type trim(column1) 600+ times! I now do all simple transforms like this in the SQL query to get the data, since that can easily be generated from an Excel sheet.
(e) There are many quirky things, components that turn invisible, sometimes you open the package and all the components are completely re-arranged incoherently.
(f) The FTP feature, possibly one of the most common things you need in ETL, is weak and only supports plain vanilla FTP which nobody uses. Everyone these days uses SFTP, FTPS, https, etc... So almost every implementation requires using a 3rd party commend line driven file transfer app the package has to call.
(g) Trying to CYA, similar to the ridiculous security in Windows Vista, Microsoft has made it exceedingly difficult to actually promote an SSIS package from one environment to another. It defaults to this stupid thing of "encrypting sensitive information with user key" security which means it must run under the same account in the environment you are moving it to as the environment you developed it, something that is rarely the case. There are better ways to configure but it always try to revert to this completely useless security protection.
(h) Lastly most of these problems are now in there 3rd version, clearly indicating Microsoft has no plan to fix them.
(i) Debugging is not nearly as easy as other languages.
SSIS still has a great many benefits, but not without some serious pain.
我多年前开始使用 MS Reporting Services,并且非常喜欢它。 我还没有尝试过 Penaho 的报告解决方案,所以我无法对其发表评论。 我也没有尝试过 Analysis Services 或 Pentaho 的替代方案。
最近,我需要一个 ETL 解决方案,并且熟悉了 MSSQL 和 MSRS,显然我会回顾并可能选择 MS Integration Service。 但对我来说,MSIS 太糟糕了。 主要是因为它不直观。 在花了几天时间尝试学习该工具后,我决定寻找替代方案,并遇到了 Pentaho Data Integration(以前称为 Kettle)。 我在几分钟内就启动并运行了它,并立即创建了我的第一个转换。 它就是有效的。
诚然,我的需求相当简单,但性能非常好,而且社区似乎非常有帮助。
I started using MS Reporting Services many years ago and just love it. I've not tried Penaho's reporting solution so I can't comment on it. Nor have I tried either Analysis Services or Pentaho's alternative.
Recently I needed an ETL solution and being familiar with MSSQL and MSRS it seemed obvious that I would review and probably choose MS Integration Service. But for me, MSIS was awful. Mostly because it was not intuitive. After spending a couple of days trying to learn the tool I decided to look for an alternative and came across Pentaho Data Integration, formerly known as Kettle. I had it up and running within minutes and immediately created my first transformation. It just works.
Admittedly my needs are fairly simple but performance has been great and the community seems very helpful.
我已经使用过 SSIS 和 Pentaho Kettle,并且我强烈建议使用 Pentaho Kettle 作为 ETL 工具而不是 SSIS。
我的理由:
-SSIS 的流程是任务到任务。 Kettle 让您思考流经系统的数据行。 对我来说,Kettle 的方法似乎更加直观。
-SSIS 的记录很少。 有时候是这样的。 但似乎有很多角落和缝隙的点击和变量设置。 非常复杂。 Pentaho 有一个非常有帮助的社区论坛。
-我相信 Pentaho 能够与多种类型的数据库集成,包括 SQL Server。 您还可以使用 JDBC,这很好。 另外,我还使用它在一侧的 SQL Server 和 Oracle 以及另一侧的 Vertica 之间进行切换。 Vertica 上有一个可用的批量加载器。 那真是太好了。
-相对而言,我发现让 SSIS 包在服务器上运行非常非常困难。 这根本不值得我花时间。
-我发现 Pentaho 向一个人或一组人邮寄警告或错误消息非常容易。
-Pentaho 允许在 JavaScript 中完成需要一些逻辑的任务。 使用我们大多数人都遇到过的语言即可简单轻松地完成。
I have used SSIS and Pentaho Kettle, and I would highly recommend using Pentaho Kettle for your ETL tool instead of SSIS.
My reasons:
-the flow of SSIS is task to task. Kettle makes you think about rows of data flowing through the system. Kettle's approach seems much more intuitive to me.
-SSIS is poorly documented. This happens. But there seems to be a lot of nook-and-cranny clicking and setting of variables. Very complex. Pentaho has a community forum which is quite helpful.
-I trust Pentaho to integrate with multiple types of databases, including SQL Server. You can also use JDBC which is nice. Also, I've used it to go between SQL Server and Oracle on one side and Vertica on the other. It has a bulk loader available for it on Vertica. That's quite nice.
-I have found it very, very hard relatively speaking to get a SSIS package to run on a server. It just wasn't worth my time.
-I found it quite easy for Pentaho to mail a warning or error message to a person or list of people.
-Pentaho allows tasks to be done in JavaScript for things that need some logic. Simple and easily done with a language most of us have come across.
我无法提供有关 MS BI Stack 的任何意见,但在最近的 Barcamp Orlando 上,Pentaho 的人员也在那里并谈到了他们的产品,这是一个非常令人印象深刻的演示。
事实上,它是一个开源项目,您可以自行扩展,并且是一个提供真正优质服务的付费包,这给您带来了很多选择。 他们展示了他们为客户所做的一些有偿工作,他们绝对让观众惊叹不已。
我还有机会与一位从事 Pentaho 数据仓库方面工作的开发人员聊了一会儿,他非常敏锐,乐于接受建议,并且能够毫无问题地回答任何问题。
因此,就一家公司而言,Pentaho 的工作以及所有开发人员的友好和平易近人给我留下了深刻的印象。
I can't offer any input on the MS BI Stack but at the most recent Barcamp Orlando, the folks from Pentaho were there and spoke about their products and it was an extremely impressive demo.
The fact that it's an Open Source project that you can extend yourself as well as a paid package for really good service leaves you with a lot of options. They demonstrated some paid work they did for a client and they definitely wow'd the crowd.
I also had a chance to chat a little bit with a developer working on the data warehousing side of things for Pentaho and he was extremely sharp and was very open to suggestions and had no problems answering any questions.
So as far as a company goes, Pentaho really impressed me with both their work and how friendly and approachable all of their developers were.
需要补充的几点
工具问题需要从更大的文化问题来解决——什么样的商店使用开源工具? 根据我的经验,我发现尽管 Microsoft 商店看起来更加严格,但当您在 Microsoft 商店中遇到连接字符串问题时,您可以寻求帮助。在 Pentaho 和 Linux 商店中,它更适合 DYI。
顺便说一句,请注意 Pentaho 销售人员的演示 - 他们展示的所有东西都比看起来更难实现! :)
a couple of points to add
Tool questions need to be addressed in terms of larger cultural questions - what kind of shops use open source tools? in my experience i've found that althsough Microsoft shops seem more rigid, when you have trouble with a connection string in a Microsoft shop you can get help.. in Pentaho and Linux shops its more DYI.
BTW, watch out for Pentaho sales guys doing demos - all the things they show are a lot harder to get working than it seems! :)
如果您正在寻找一种强大、低成本的替代方案,LogiXML 可以在 .NET 平台上提供仪表板和临时报告。 我们从 2006 年底 Pentaho 刚刚起步时就开始使用它们,但我已经有一段时间没有看过它了。
If you are looking for a robust, low cost alternative to the big boys LogiXML has dashboarding and ad hoc reporting on a .NET platform. We've been using them since late 2006 when Pentaho was just starting, but I haven't looked at it in awhile.
我最近尝试了pentaho开源BI。 我发现它非常笨拙。 它不是很直观,开发时间也更长。
它与 Oracle 或 MS BI 解决方案有很大不同。 也许企业版更好。
I recently tried pentaho open source BI. I found it to be extremely clumsy. It was not very intuitive and development time took much longer.
It is quite different from either Oracle or ms BI solutions. Maybe the enterprise edition is better.