使用 SVD 与 Movielens/Netflix 类型数据集的基本伪代码

发布于 2024-10-22 04:36:33 字数 1619 浏览 12 评论 0原文

我正在努力弄清楚如何开始使用 SVD 与 MovieLens/Netflix 类型数据集进行评级预测。我非常感谢 python/java 中的任何简单示例，或所涉及过程的基本伪代码。有许多论文/帖子总结了总体概念，但我不确定如何开始实现它，即使使用了一些建议的库。

据我了解，我需要按如下方式转换我的初始数据集：

初始数据集：

    user    movie   rating
    1       43      3
    1       57      2
    2       219     4

需要旋转为：

user        1   2
movie   43  3   0
        57  2   0
        219 0   4

此时，我是否只需将此矩阵注入到可用库提供的 SVD 算法中，以及然后（以某种方式）提取结果，或者我还需要做更多的工作吗？

我读过的一些信息：

http://www.netflixprize.com/community/ viewtopic.php?id=1043
http://sifter.org/~simon/journal/20061211.html
http://www .slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine
http://www.slideshare.net/bmabey/svd-and -netflix-dataset-presentation
..以及许多其他论文

一些图书馆：
LingPipe(java)
Jama(java)
Pyrsvd(python)

如有任何提示，尤其是有关基本数据的提示，我们将不胜感激放。非常感谢，奥利

原文

I'm struggling to figure out how exactly to begin using SVD with a MovieLens/Netflix type data set for rating predictions. I'd very much appreciate any simple samples in python/java, or basic pseudocode of the process involved. There are a number of papers/posts that summarise the overall concept but I'm not sure how to begin implementing it, even using a number of the suggested libraries.

As far as I understand, I need to convert my initial data set as follows:

Initial data set:

    user    movie   rating
    1       43      3
    1       57      2
    2       219     4

Need to pivot to be:

user        1   2
movie   43  3   0
        57  2   0
        219 0   4

At this point, do I simply need to inject this Matrix into an SVD algorithm as provided by available libraries, and then (somehow) extract results, or is there more work required on my part?

Some information I've read:

http://www.netflixprize.com/community/viewtopic.php?id=1043
http://sifter.org/~simon/journal/20061211.html
http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine
http://www.slideshare.net/bmabey/svd-and-the-netflix-dataset-presentation
.. and a number of other papers

Some libraries:
LingPipe(java)
Jama(java)
Pyrsvd(python)

Any tips at all would be appreciated, especially on a basic data set.
Thanks very much,
Oli

分享到QQ

分享到微博