使用 SVD 与 Movielens/Netflix 类型数据集的基本伪代码
我正在努力弄清楚如何开始使用 SVD 与 MovieLens/Netflix 类型数据集进行评级预测。我非常感谢 python/java 中的任何简单示例,或所涉及过程的基本伪代码。有许多论文/帖子总结了总体概念,但我不确定如何开始实现它,即使使用了一些建议的库。
据我了解,我需要按如下方式转换我的初始数据集:
初始数据集:
user movie rating
1 43 3
1 57 2
2 219 4
需要旋转为:
user 1 2
movie 43 3 0
57 2 0
219 0 4
此时,我是否只需将此矩阵注入到可用库提供的 SVD 算法中,以及然后(以某种方式)提取结果,或者我还需要做更多的工作吗?
我读过的一些信息:
http://www.netflixprize.com/community/ viewtopic.php?id=1043
http://sifter.org/~simon/journal/20061211.html
http://www .slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine
http://www.slideshare.net/bmabey/svd-and -netflix-dataset-presentation
..以及许多其他论文
一些图书馆:
LingPipe(java)
Jama(java)
Pyrsvd(python)
如有任何提示,尤其是有关基本数据的提示,我们将不胜感激放。 非常感谢, 奥利
I'm struggling to figure out how exactly to begin using SVD with a MovieLens/Netflix type data set for rating predictions. I'd very much appreciate any simple samples in python/java, or basic pseudocode of the process involved. There are a number of papers/posts that summarise the overall concept but I'm not sure how to begin implementing it, even using a number of the suggested libraries.
As far as I understand, I need to convert my initial data set as follows:
Initial data set:
user movie rating
1 43 3
1 57 2
2 219 4
Need to pivot to be:
user 1 2
movie 43 3 0
57 2 0
219 0 4
At this point, do I simply need to inject this Matrix into an SVD algorithm as provided by available libraries, and then (somehow) extract results, or is there more work required on my part?
Some information I've read:
http://www.netflixprize.com/community/viewtopic.php?id=1043
http://sifter.org/~simon/journal/20061211.html
http://www.slideshare.net/NYCPredictiveAnalytics/building-a-recommendation-engine-an-example-of-a-product-recommendation-engine
http://www.slideshare.net/bmabey/svd-and-the-netflix-dataset-presentation
.. and a number of other papers
Some libraries:
LingPipe(java)
Jama(java)
Pyrsvd(python)
Any tips at all would be appreciated, especially on a basic data set.
Thanks very much,
Oli
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
数据集: http://www.grouplens.org/node/73
SVD:如果您不明白如何进行 SVD,为什么不在
SAGE
中进行呢? Wolfram alpha 或 http://www.bluebit.gr/matrix-calculator/ 将分解矩阵给你,或者它在维基百科上。Data set: http://www.grouplens.org/node/73
SVD: why not just do it in
SAGE
if you don't understand how to do SVD? Wolfram alpha or http://www.bluebit.gr/matrix-calculator/ will decompose the matrix for you, or it's on Wikipedia.请参阅SVDRecommender。关于输入格式的问题完全取决于您使用的库或代码。没有一个标准。在某种程度上,是的,代码将在内部构造某种矩阵。对于 Mahout,当以文件形式提供时,所有推荐器的输入都是一个 CSV 文件,其中包含诸如
userID,itemID, rating
之类的行。See SVDRecommender in Apache Mahout. Your question about input format entirely depends on what library or code you're using. There's not one standard. At some level, yes, the code will construct some kind of matrix internally. For Mahout, the input for all recommenders, when supplied as a file, is a CSV file with rows like
userID,itemID,rating
.