如果大多数评级为 5/被动过滤建议，KNN 是否有价值

发布于 2024-08-10 16:10:03 字数 607 浏览 10 评论 0原文

我一直在考虑建立一个“喜欢 x 的人，也喜欢 y 的人”类型的推荐系统，并且正在考虑使用 Vogoo，但在查看他们的代码后，似乎有很多基于评级的最近邻居。

在过去的几周里，我看到了一些文章，指出大多数人要么根本不评分，要么评分 5 http://youtube-global.blogspot.com/2009/09/ Five-stars-dominate- ratings.html

我目前没有实施了评级系统，如果所有适用的评级没有波动，我真的认为没有必要实施它。

这是否意味着 KNN 并不真正有价值？

有人对开发一个系统以根据过去的观看历史记录（被动过滤）获得相似的推荐有任何建议吗？

我正在使用的数据是基于赛事的，因此，如果您查看过男子双打网球、蓝鸟棒球、大学女子篮球等。我会推荐您所在地区目前正在举办的其他赛事，其他人也看过这些赛事整个系统的类似事件也都看过。

我主要使用 PHP，但已经开始学习 Python（如果有帮助的话，可能需要学习 Java）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

┊风居住的梦幻卍 2024-08-17 16:10:03

好吧，对你的第一个问题的简短回答是否定的。如果您的数据没有变化（YouTube 明星），则很难提出建议。

我可能建议尝试扩大您拥有的数据量。对于 YouTube 示例，不仅要查看星级，还要考虑观看视频的百分比。大量的暂停、搜索、倒带可能意味着用户喜欢该视频并希望更频繁地观看某些部分，因此应该从中获得提升。

至少在音乐世界中，进行推荐的标准方法是提出一个可以使用的距离度量，它可以给出任意两首音乐之间的距离。然后，当您找出用户喜欢的音乐类型时，您可以通过根据距离度量选择“接近”的歌曲来选择与他们的口味相似的音乐类型。它们也称为相似度矩阵，其中距离高的两个项目的相似度较低。

所以问题归结为如何产生这些相似之处。一种方法是计算有多少人观看了节目 A 也观看了节目 B。如果您对每对事件都执行此操作，您将能够从您分析的语料库中提出建议。不幸的是，这并不能很好地扩展到为您尚不知道有多少人观看的活动（现场活动而不是录制的活动）提供推荐。

但这至少是一个开始。

回复收藏 0 原文

风吹雨成花 2024-08-17 16:10:03

在安德鲁斯做出了很好的回应之后，我决定解释一下我所做的事情，并希望它可以帮助其他人（尽管它可能特定于我的实现）。

请记住，我已经获得了许多事件以及这些事件发生地点的数据。

我用来构建推荐的脚本就是这个。
http://www.codediesel.com/php/item-based -collaborative-filtering-php/

但是，系统中没有任何评级，并且由于基于用户的评级的“可疑”值，我根据数据集中已有的相似性创建了评级。

我基本上是这样构建的。

1) User one goes to mens tennis matches. 
2) Get all other users who go to mens tennis matches. 
3) For each user who goes to mens tennis matches, what other sports do those users go to?
4) For each  of the other sports, how many users attended those events as a count.
I used that count as the score, for the sports on the first user. 
5) Then, for each user who went to tennis, I built a 'similarity to first user' based on how many other sports they went to, and the score of those sports to the first user. 
6) This created a distance score for each user, and I applied that distance score as a score on each of the sports the secondary user went to. 
7) All of this was put into an array and passed to the recommendation linked to above

根据我正在使用的样本量，这实际上比我预期的要好得多。

然而，它的运行速度却非常慢。
不知道从这里我将如何进步。

After Andrews great response, I've decided to explain what I've done and hope it may help others (though it may be specific to my implementation).

Keeping in mind that I've got data on LOTS of events and where those events take place.

The script I used to build recommendations was this one.
http://www.codediesel.com/php/item-based-collaborative-filtering-php/

However, without having any ratings already in the system, and due to the 'questionable' value of user based ratings, I created ratings based on the similarities I already had in the data set.

I basically structured it like this

1) User one goes to mens tennis matches. 
2) Get all other users who go to mens tennis matches. 
3) For each user who goes to mens tennis matches, what other sports do those users go to?
4) For each  of the other sports, how many users attended those events as a count.
I used that count as the score, for the sports on the first user. 
5) Then, for each user who went to tennis, I built a 'similarity to first user' based on how many other sports they went to, and the score of those sports to the first user. 
6) This created a distance score for each user, and I applied that distance score as a score on each of the sports the secondary user went to. 
7) All of this was put into an array and passed to the recommendation linked to above

This actually worked surprisingly better than I had expected based on the sample size I was working with.

However, it is painfully slow to run.
Not sure how I'll progress from here.

回复收藏 0 原文