如何使用增量 SVD 推荐系统创建推荐
我正在测试一个根据 Simon Funk 算法构建的推荐系统。 (由 Timely Dev 编写。http://www.timelydevelopment.com/demos/NetflixPrize.aspx)
问题是,所有增量 SVD 算法都尝试预测 user_id 和 movie_id 的评分。但在真实的系统中,这应该为活动用户生成一个新项目列表。 我看到有些人在增量 SVD 之后使用了 kNN,但是如果我没有错过任何东西,如果我在通过增量 SVD 创建模型后使用 kNN,我会失去所有性能增益。
任何人都有增量 SVD/Simon Funk 方法的经验,并告诉我如何生成新推荐项目列表?
I am testing a recommendation system that is built according to Simon Funk's algorithm.
(written by Timely Dev. http://www.timelydevelopment.com/demos/NetflixPrize.aspx)
The problem is, all Incremental SVD algorithms try to predict the rating for user_id and movie_id. But in a real system, this should produce a list of new items to the active user.
I see that some people used kNN after Incremental SVD, but if I don't miss something, I lose all the performance gain if I use kNN after creating the model by Incremental SVD.
Anyone has any experience with Incremental SVD/Simon Funk method, and tell me how to produce list of new recommended items?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
产生推荐电影的方式:
对于理论:假装只有两个维度(喜剧和戏剧)。如果我喜欢喜剧,但讨厌戏剧,我的特征向量是
[1.0, 0.0]
。如果你将我与以下电影进行比较:The way to produce recommended movies:
For the theory: pretend there are only two dimensions (comedy and drama). If I love comedies, but hate dramas, my feature vector is
[1.0, 0.0]
. If you compare me against the following movies:这是一个基于 Yelp Netflix 代码的简单 Python 代码。如果您安装 Numba,它将以 C 速度运行。
data_loader.py
funk.py
Here is a simple Python code based on Yelp Netflix code. If you install Numba it will go at C speeds.
data_loader.py
funk.py
我认为这是一个大问题,因为有很多推荐方法我认为可以称为“增量 SVD”。要回答您的具体问题:kNN 是在投影项目空间上运行的,而不是在原始空间上运行的,因此应该非常快。
I think this is a big question, as there are many recommender approaches that I think could be called "incremental SVD". To answer your specific question: kNN is run on the projected item space, not the original space, so should be quite fast.
假设您有 n 个用户和 m 个项目。经过增量 SVD 后,您将获得 k 个经过训练的特征。要获取给定用户的新项目,请将 1xk 用户特征向量和 kxm 项目特征矩阵相乘。您最终会得到该用户对每个项目的 m 个评分。然后对它们进行排序,删除已经看过的,并显示一些新的。
Assume you have n users and m items. After incremental SVD you have k trained features. To get the new items for a given user multiply the 1xk user feature vector and the kxm item feature matrix together. You end up with the m ratings for each item for that user. Then just sort them, remove ones they have already seen, and show some number of new ones.