Apache Mahout,使用或不使用
我正在为用户创建的组件集合实现一个简单的推荐系统。
我计划使用 JPA 和一些专用 EJB 来完成此任务。我的实体将有额外的几个列表,其中包含最新的建议,然后 EJB 将抓取数据集并定期更新此列表。该模型基于组件之间的关系,而不依赖于过去的用户行为。我预计数据集将保持相对较小。可能不超过五十万件。
我非常清楚如何使用 JPA 和 EJB 来做到这一点,并且我认为对于我的特定用例来说,这将非常有效。
我应该花时间学习和实施 Mahout 吗?我确实有一点使用 hadoop 的经验,不过,我不认为我的数据集大到足以证明引入大象是合理的。
另外,有人能给我推荐一本关于用 mahout 实现推荐系统的好入门书吗?
多谢。
I am implementing a simple recommendation system for a collection of user created components.
I was planning on doing this with JPA and a few dedicated EJB. My entities would have extra couple of lists containing the most up to date recommendations, then an EJB would crawl the data set and update this list periodically. The model is based on the relationships between components, and does not depend on past user behavior. I expect that the data set will remain relatively small. probably no more than half a million items.
I have a pretty good idea of how to do this with JPA and EJB, and I think for my particular use case, this would be very effective.
Should I spend the time to learn and implement Mahout? I do have a bit of experience with hadoop, although, I don't think my data set will be nearly large enough to justify bringing in the elephant.
Also, can anyone point me to a good primer on implementing recommendation systems with mahout?
Thanks a lot.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您正在实现推荐引擎,请注意 Mahout 的那部分具有基于 Hadoop 的完全独立的实现,而不是基于 Hadoop。这很好,因为 Hadoop 不是那种可以直接连接到任何基于 EJB 的东西。而且你不存在巨大的规模问题。因此,您无需担心 Hadoop。
除了
.hadoop
包之外,您还想查看 org.apache.mahout.cf.taste.impl 中的内容;它只是纯 Java,因此您可以将其嵌入到 EJB 中。我认为您想要查看Recommender
API,然后将其包装在您的会话 bean 中并按照您喜欢的方式公开它。(这些天你真的想使用 EJB 吗?单独的问题...)
事实上,之前的版本 0.4 仍然有一个 EJB 绑定示例作为无状态会话 bean。您可以挖出并重复使用该包装纸。
这部分代码的最佳网络资源是:
https://cwiki.apache.org/MAHOUT/recommender-documentation.html
我们的书 Mahout in Action 显然不是免费的,但无疑是该项目最好且唯一的参考。我在这里编写了有问题的代码以及本书中有关此代码的部分,因此它非常直接来自源代码。
If you are implementing a recommender engine, be aware that that piece of Mahout has quite separate implementations based on Hadoop, and not based on Hadoop. That's good because Hadoop is not the sort of thing that would be hooked up directly to anything EJB-based. And you don't have huge scale problems. So, you don't need to worry about Hadoop.
You want to look at the stuff in
org.apache.mahout.cf.taste.impl
besides the.hadoop
package; it's all just pure Java so you could embed it in an EJB. I think you want to look at theRecommender
API and then just wrap that in your session bean and expose it however you like.(Do you really want to use EJBs these days? Separate question...)
In fact, the previous release, 0.4, still had an EJB binding example as a stateless session bean. You could dig out and reuse that wrapper.
The best web resource for this part of the code is:
https://cwiki.apache.org/MAHOUT/recommender-documentation.html
Our book, Mahout in Action, is obviously not free but is certainly the best and only reference for the project. I wrote the code in question here and the part of the book on this code and so it's pretty direct from the source.