书店应用程序的推荐系统
嘿,我正在尝试学习 Amazon.com 等网站中使用的一些推荐算法。所以我有这个简单的java(spring hibernate postgres)书店应用程序,其中Book具有属性标题,类别,标签,作者。为了简单起见,书中没有任何内容。一本书必须通过标题、类别、作者和标签来标识。对于每个登录该应用程序的用户,我应该能够推荐一些书籍。每个用户都可以查看一本书,将其添加到购物车并随时购买。因此,在数据库中,我存储每个用户查看一本书的次数、购物车中的书籍以及用户购买的书籍。目前没有评级选项,但也可以添加。
那么有人可以告诉我可以使用哪些算法来向每个用户展示一些书籍推荐吗?我想让事情变得非常简单。它不是一个出售的项目,只是为了扩展我在推荐算法方面的知识。因此,假设总共只有大约 30 本书(5 个类别,每个类别 6 本书)。如果有人还可以告诉我应该使用哪些属性来计算两个用户之间的相似性以及如何使用推荐的算法来处理它,那将非常有帮助。
提前致谢。 血清素追逐。
Hey I'm trying to learn some of the recommendation algorithms that's being used in websites like Amazon.com. So I have this simple java (spring hibernate postgres) book store application where in Book has the attributes title, category, tags, author. For simplicity there's no content inside the book. A book has to be identified by its title, category, author and tags. For each user logging into the application I should be able to recommend some books. Each user can view a book, add them to cart and buy it anytime. So in the database I'm storing how many times each user looked at a book, the books in his cart and the books the user has bought. At the moment there's no rating option but that can be added too.
So can someone tell me what are the algorithms I could use to demonstrate some recommendation of books for each user? I want to keep it really simple. Its not a project to sell but only to expand my knowledge on recommendation algorithms. So assume there are only about 30 books in total(5 categories and 6 books in each). It would be really helpful if someone could also tell me what should be the attributes I should be using to calculate similarities between two users and how to go about it with the algorithms recommended.
Thanks in advance.
SerotoninChase.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以在此处找到所有信息和常见算法的实现(Taste框架)库。
《行动中的集体智慧》 是除了其他发帖者建议之外我还可以推荐的另一本书
You can find all the information and an implementation (Taste framework) library of common algorithms at here.
Collective Intelligence in Action is another book I can suggest in addition to what other poster had suggested
作为一个特定的具体示例,一种选择是“最近 K 邻居”算法。
为了简化事情,假设您只有十本书,并且您只跟踪每个用户查看每本书的次数。然后,对于每个用户,您可能有一个数组
int timesViewed[10]
,其中timesViewed[i]
的值是用户查看书号的次数我
。然后,您可以使用相关函数将该用户与所有其他用户进行比较,例如 例如皮尔逊相关性。计算当前用户
c
和另一个用户o
之间的相关性给出 -1.0 和 1.0 之间的值,其中 -1.0 表示“此用户c
与其他用户o
完全相反”,1.0 表示“该用户c
与其他用户o
相同”。如果计算
c
与每个其他用户之间的相关性,您将获得一个结果列表,显示该用户的观看模式与每个其他用户的观看模式的相似程度。然后,您选择K
(例如 5、10、20)个最相似的结果(算法的名称由此而来),即相关性分数最接近的K
个用户到 1.0。现在,您可以对每个用户的
timesViewed
数组进行加权平均。例如,我们可以说averageTimesViewed[0]
是这 K 个用户中每个用户的timesViewed[0]
的平均值,并按其相关性得分进行加权。然后对彼此执行相同的操作averageTimesViewed[i]
。现在您有一个数组
averageTimesViewed
,粗略地说,它包含与c
观看模式最相似的 K 个用户查看每本书的平均次数。推荐averageTimesViewed
得分最高的图书,因为这是其他用户最感兴趣的图书。通常也值得将用户已经看过的图书排除在推荐之外,但它仍然是在计算相似性/相关性时,重要的是要考虑到这些因素。
另请注意,这可以简单地扩展以考虑其他数据(例如购物车列表等)。此外,如果需要,您可以选择所有用户(即
K
= 用户数量),但这并不总是产生有意义的结果,并且通常会选择相当小的< code>K 足以获得良好的结果,并且计算速度更快。As a particular concrete example, one option is a "nearest K neighbours" algorithm.
To simplify things, imagine you only had ten books, and you were only tracking how many times each user viewed each book. Then, for each user, you might have an array
int timesViewed[10]
, where the value oftimesViewed[i]
is the number of times the user has viewed book numberi
.You can then compare the user to all of the other users using a correlation function, such as the Pearson correlation for example. Computing the correlation between the current user
c
and another usero
gives a value between -1.0 and 1.0, where -1.0 means "this userc
is the complete opposite of the other usero
", and 1.0 means "this userc
is the same as the other usero
".If you compute the corellation between
c
and every other user, you get a list of results of how similar the user's viewing pattern is to that of each other user. You then pick theK
(e.g. 5, 10, 20) most similar results (hence the name of the algorithm), that is, theK
users with the correlation scores closest to 1.0.Now, you can do a weighted average of each of those user's
timesViewed
arrays. For example, we'll sayaverageTimesViewed[0]
is the average of thetimesViewed[0]
for each of those K users, weighted by their correlation score. Then do the same for each otheraverageTimesViewed[i]
.Now you have an array
averageTimesViewed
which contains, roughly speaking, the average number of times the K users with the most similar viewing patterns toc
has viewed each book. Recommend the book which has the highestaverageTimesViewed
score, since this is the book the other users have shown most interest in.It's usually worth also excluding books the user has already viewed from being recommended, but it is still important to keep those accounted for when computing similarity/correlation.
Also note that this can be trivially extended to take other data into account (such as cart lists etc). Also, you can select all users if you want (i.e.
K
= number of users), but that doesn't always produce meaningful results, and usually picking a reasonably smallK
is sufficient for good results, and is quicker to compute.阅读“集体智慧编程”。它会让您尝到它的味道以及更多。
Read "Programming Collective Intelligence". It'll give you a taste of it and lots more.
你在这里拥有巨大的自由。衡量两个用户之间的相似性,然后创建一个单调函数,将相似用户对书籍的评分作为输入并返回每本书的分数。标准解决方案是使用矩阵乘法。
You have a gigantic amount of freedom here. Make up a measure of similarity between two users and then make a monotonic function that takes similar users' ratings of books as input and returns scores for each book. The standard solution is to use matrix multiplication.