Apache Mahout +皮尔逊相关性忽略了对每个项目具有相同偏好的用户
我使用 Mahout 和 Pearson 相关算法来根据用户对多个项目的偏好来比较和查找相似的用户。我遇到的问题是 Mahout 和/或 Pearson 忽略了为每个项目选择相同偏好的用户。有谁知道是否有一种方法可以配置 Mahout,使其不忽略为每个项目选择相同偏好值的人。
I'm using Mahout with the Pearson Correlation algorithm to compare and find similar users based on their preferences for several items. The problem I'm running into is that Mahout and/or Pearson is ignoring users that select the same preference for every item. Does anyone know if there is a way to configure Mahout to NOT ignore people that select the same preference value for every item.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这不是配置问题。在这种情况下,皮尔逊相关性未定义,因此使用此指标无法计算它们之间的相似性。
本质上,皮尔逊是两个偏好系列的协方差与其标准差的乘积之比。但当一个或两个序列相同时,标准差为 0,协方差也为 0,因此相关性为 0/0。
(这个和其他一些 Pearson 陷阱在 Mahout in Action 的第 4 章中有介绍,我是本书的这一部分和代码。)
It is not a question of configuration. The Pearson correlation is undefined in this case, so there can be no similarity computed between them using this metric.
Essentially -- Pearson is the ratio of the two preference series' covariance to the product of their standard deviations. But when one or both sequences are identical, the standard deviation is 0, as is the covariance, so the correlation is 0/0.
(This and a few other Pearson gotchas are covered in Chapter 4 of Mahout in Action, and I'm author of this part of the book and code.)