Mahout rescorer 实施
我想通过它们共享的 Corating 数量除以 50 来对两个项目之间的所有 PearsonItemSimilarity 值进行加权。
或者换句话说,相应地更新两个项目(例如项目 a 和 b)之间的通用皮尔逊相似度 - 相似性_new_ab =相似性_ab*numCoRatings_ab/50
如何使用现有的mahout框架获得两个游戏之间的corating数量。
有人可以将我链接到(或说明)重新记录器的示例实现吗?
我这样做的理由如下,
我假设计算的大多数皮尔逊相似度都是基于少量(大多数情况下为 1 或 2)的 Coratings。这将导致游戏之间的皮尔逊相关性为 1,但事实上,如果存在更多相关性,情况可能并非如此。
考虑到这一点,我想将这些“天真的”皮尔逊相似性更改为同样基于共同评分数量的相似性。
我以为这就是重新记录器的用途,但我想我错了。
I'd like to weight all of my PearsonItemSimilarity values between two items by the number of coratings they share divided by 50.
Or in other words update the generic pearson similarity between two items (items a and b for instance) accordingly --
similarity_new_ab = similarity_ab*numCoRatings_ab/50
How does one get the number of coratings between two games using the existing mahout framework.
Can someone please link me to (or illustrate) an example implementation of a rescorer?
My reasoning for doing this is as follows,
I postulate that most of the Pearson-similarities calculated are based on a small number (1 or 2 in most cases) of coratings. This would lead to the games sharing a Pearson correlation of 1 with each other, which in fact would probably not be the case should more coratings exist.
To account for this, I'd like up change these "naive" Pearson similarities to a similarity that is also based on the number of co-ratings.
I thought this is what the rescorer was built for, but I guess I was wrong.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要使用
DataModel
上的getNumUsersWithPreferenceFor()
方法并向其传递两个项目 ID。我不认为这对于相似性度量来说是最好的做法。如果您使用共现,请查看
LogLikelihoodSimilarity
。不过,这与 Rescorer 无关,您的问题是什么?
You want the method
getNumUsersWithPreferenceFor()
onDataModel
and pass it the two item IDs.I don't think this is the best thing to do for this similarity metric. If you are using co-occurrence, look at
LogLikelihoodSimilarity
instead.This has nothing to do with
Rescorer
though, what is your question there?