当前位置：文江博客话题详情

如何编写一个推荐项目系统？

发布于 2024-12-17 14:55:45 字数 382 浏览 4 评论 0原文

可能的重复：
我在哪里可以了解推荐系统？

我一直对如何学习推荐系统感兴趣网站根据我“喜欢”的内容、我关注的内容、我投票赞成/反对的内容向我推荐文章和用户。

而且它还可以在我浏览某个项目时推荐项目，“相关文章”，“喜欢这篇文章的人也喜欢……”

我需要一些文章和图像来教我如何实现这样一个系统。非常感谢。

更新：

我得到了一个关键字“Slope one”

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柠檬 2024-12-24 14:55:45

维基百科文章推荐系统是一个很好的起点。此外，这篇推荐引擎博客文章提供了一些很好的信息和插图。

最简单的方法是使用“喜欢这篇文章的人也喜欢......”的方法。如果您跟踪每个用户的文章评分，并跟踪谁喜欢哪些文章，那么您就有了推荐系统的基础。

例如，假设您正在查看文章 A。系统可以在其索引中查找喜欢文章 A 的每个用户。然后，它可以从该列表中创建每个喜欢文章 A 的用户喜欢的所有文章的列表。很可能会存在显着的重叠（即，某些文章被多人喜欢）。您的算法会跟踪每篇文章获得的点赞数，然后显示获得最多票数的前 N 篇文章。

这个简单的系统在很多情况下都非常有效，但并不完美。您会发现特别受欢迎的文章占据主导地位，即使它们与您正在查看的文章无关。有一些方法可以防止非常受欢迎的文章占据主导地位。一种方法是使用浮点数作为文章的分数。您不是为每个“喜欢”的分数加 1，而是添加 1 / sqrt(users_number_of_likes)。因此，喜欢 100 篇文章的用户只会为任何一篇文章给予 1/10 分，但只喜欢 4 篇文章的用户则会给予 1/2 分> 每点一分。虽然这听起来不太“公平”，但它确实会削弱非常受欢迎但不相关的项目的效果。

正如我所说，这是最简单的方法。如果您正在寻找“相关”文章，而不是基于用户输入，那么您必须为每篇文章分配关键字，或者需要某种方法来检查文章并提取相关关键字。

有很多方法可以实现您想做的事情。您选择哪一种取决于数据的性质、是否进行协同过滤、您想要花多少时间来开发它以及您希望结果有多好。

The Wikipedia article, Recommender system, is a good place to start. Also, this Recommendation engine blog post has some good information and illustrations.

The simplest method is one that uses the "people who like this article also like..." approach. If you keep track of each users' article ratings, and also keep track of who likes which articles, then you have the basis for a recommendation system.

For example, say that you're viewing Article A. The system can look up in its index every user who liked Article A. From that list, it can then create a list of all the articles liked by every user who liked Article A. In all likelihood, there will be significant overlap (that is, some articles were liked by multiple people). Your algorithm keeps track of how many likes it got for each article, and then shows the top N that got the most votes.

That simple system is surprisingly effective in many cases, but not perfect. You'll find that exceptionally popular articles dominate, even if they're not related to the article you're viewing. There are ways to prevent the hugely popular articles from dominating. One way is to use a floating point number for an article's score. Rather than adding 1 to the score for each "like", you add 1 / sqrt(users_number_of_likes). So that a user who likes, say, 100 articles, would only give 1/10 point to any individual article, but a user who likes only four articles would give 1/2 a point to each. Although this doesn't sound "fair," it does tend to attenuate the effect of hugely popular, but unrelated, items.

As I said, that's the simplest approach. If you're looking for "related" articles, not based on user input, then you have to either have keywords assigned to each article, or you need some way to examine an article and extract relevant keywords.

There are many ways to do what you're looking to do. Which one you choose depends on the nature of your data, whether you're doing collaborative filtering, how much time you want to spend developing it, and how good you want the results to be.

回复收藏 0 原文