是否值得购买 Mahout in Action 以加快使用 Mahout 的速度,或者是否还有其他更好的来源?
我目前是 Apache Mahout 的临时用户,我正在考虑购买这本书Mahout 在行动。不幸的是,我很难理解这本书的价值,因为它是一本 Manning Early Access Program 书(因此目前仅作为测试版电子书提供),我自己无法在书店中查看。
任何人都可以推荐这作为一个好的(或不太好的)指南来加快 Mahout 的速度,和/或可以补充 Mahout 网站的其他资源吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
作为 Mahout 提交者和本书的合著者,我认为这是值得的。 ;-)
但说真的,你在做什么?也许我们可以为您提供一些资源。
Mahout 的某些方面是你自己很难弄清楚的。我们努力回答邮件列表上的问题,但拥有示例代码和路线图确实很有帮助。如果没有这些,就很难提出一个好问题。
Speaking as a Mahout committer and co-author of the book, I think it is worth it. ;-)
But seriously, what are you working on? Maybe we can point you to some resources.
Some aspects of Mahout are just plain hard to figure out on your own. We work hard at answering questions on the mailing list, but it can really help to have sample code and a roadmap. Without some of that, it is hard to even ask a good question.
也是这里的合著者。由于“来自马的嘴”,它可能是迄今为止关于 Mahout 本身最完整的文章。那里有一些很好的博客文章,当然还有很多关于更一般的机器学习的好书(我喜欢《行动中的集体智慧》作为广泛的介绍)。 [email protected] 有一些人说他们喜欢这本书 FWIW,与图书论坛 (http://www.manning-sandbox.com/forum.jspa?forumID=623) 一样,我认为如果电子书不完全是您想要的,您可以退回电子书。它确实有 6 个关于聚类的章节。
Also a co-author here. Being "from the horse's mouth" it's probably by far the most complete write-up out there for Mahout itself. There are some good blog posts out there, and certainly plenty of good books on more generally machine learning (I like Collective Intelligence in Action as a broad light intro). [email protected] has a few people that say they like the book FWIW, as do the book forums (http://www.manning-sandbox.com/forum.jspa?forumID=623) I think you can return the e-book if it's not quite what you wanted. It definitely has 6 chapters on clustering.
本书的许多部分已经过时,比当前版本落后一两个版本。此外,文本中存在一些错误,尤其是示例中。当尝试复制讨论的结果时,这可能会让事情变得有点棘手。
此外,您应该意识到,mahout 最成熟的部分,即推荐系统、品味,并不是分布式的。我不太确定为什么它与 mahout 的其余部分打包在一起。这更多的是对软件包的抱怨,而不是对象夫本身的抱怨。
there are many parts of the book that are out of date, a version or two behind what is current. In addition, there are several mistakes within the text, particularly within the examples. this may make things a bit tricky when trying to replicate the discussed results.
Additionally, you should be aware that the most mature part of mahout, the recommender system, taste, isnt distributed. I'm not really sure why this is packaged with the rest of mahout. this is more a complaint about the software package than mahout itself.
目前是最好的。可能和产品一样成熟。有些方面比其他方面好,对底层实现的洞察很好,但对于初学者来说在 Linux、mac osx 等上启动和运行的实用方法并不多。定义一个关于如何保持推荐系统更新的明确策略是不确定的。生产示例相当薄弱。作为一个起点很好,但你还需要更多。作者尽最大努力提供帮助,但它是一个相当新的产品。总而言之,是的,买吧。
Currently the best out there. Probably as mature as the product. Some aspects are better than others, insight into the underlying implementation is good, practical methods to get up and running on Linux, mac osx, etc for beginners not so much. Defining a clear strategy about how to keep a recommender updated is iffy. Production examples pretty thin. Good as a starting point but you need a lot more. Authors make best attempt to help, but is a pretty new product. All in all, yes, buy it.
几周前我收到了这本书。强烈推荐。作者在邮件列表上也非常活跃,并且这个项目中有很多很酷的能量。
I got the book a few weeks ago. Highly recommended. The authors are very active on the mailing list, too, and there is a lot of cool energy in this project.
您还可以考虑阅读 Paco Nathan 的级联企业数据工作流程。您可以在从 R 或 SAS 导出的集群上运行 PMML。这并不是说《Mahout in Action》有什么不好的地方,作者们做得很好,并且显然投入了大量的时间和精力来使其具有启发性和趣味性。这更多的是一个超越 Mahout 的建议。如果它对用户更加友好的话,它目前还没有得到那种吸引力。
就目前情况而言,Mahout 用户体验有点不稳定,并且并不能真正让您清楚地了解如何开发和更新智能系统及其生命周期(IMO)。 Mahout 对于学术界来说也不太可接受,他们更有可能使用 Matlab 或 R。在 Mahout 文档中,随机森林实现几乎不起作用,并且文档中有错误的示例等等......这令人沮丧,并且Mahout 例程取决于算法。在我看来,我目前并不认为 Mahout 会取得任何实质性进展,同样是 IMO。我希望我错了!
http://shop.oreilly.com/product/0636920028536.do
You might also consider reading through Paco Nathan's Enterprise Data Workflows in Cascading. You can run PMML on your cluster exported from R or SAS. That is not to say anything bad about Mahout in Action, the authors did a great job and clearly put good time and effort into making it instructive and interesting. This is more of a suggestion to look beyond Mahout. It's not currently getting the kind of traction it would if it were more user friendly.
As it stands, the Mahout user experience is kinda choppy, and doesn't really give you a clear idea of how to develop and update intelligent systems and their life cycles, IMO. Mahout is not really acceptable for academics either, they are more likely to use Matlab or R. In the Mahout docs, the random forest implementation barely works and the docs have erroneous examples, etc... Thats frustrating, and the parallelism and scalability of the Mahout routines depend on the algorithm. I don't currently see Mahout going anywhere solid as it stands, again IMO. I hope I'm wrong!
http://shop.oreilly.com/product/0636920028536.do