根据其他用户喜欢查找用户喜欢的东西的算法

发布于 2024-08-08 16:43:25 字数 799 浏览 2 评论 0 原文

我正在考虑编写一个应用程序,根据家庭成员的喜好对 HTPC 中的电影进行分类。

我不懂统计学或人工智能,但是这里的东西看起来非常有趣。我不知道从哪里开始做。

这就是我想要完成的任务:

  1. 根据每个用户的喜好编写一组样本,分别对每个样本属性进行评级。例如,也许一个用户非常喜欢西方电影,因此西方类型对该用户来说会具有更多的权重(对于其他属性,如演员、导演等)。

  2. 用户可以根据其他用户的喜好获得建议。例如,如果用户 A 和 B 都喜欢斯皮尔伯格(用户之间的联系),并且用户 B 喜欢《蝙蝠侠:侠影之谜》,但用户 A 讨厌凯蒂·霍尔姆斯,则相应地为用户 A 衡量这部电影(同样,每个属性单独进行,例如,也许用户 A 不太喜欢动作片,因此将评级降低一点,并且由于凯蒂·霍尔姆斯不是主要明星,所以不要像其他属性一样考虑这一点)。

基本上,将用户 A 的集合与用户 B 的集合进行比较,并得出用户 A 的评级。

我对如何实现这一点有一个粗略的想法,但我确信一些聪明的人已经想到了更好的方法已经解决了,所以...有什么建议吗?

实际上,经过快速研究,贝叶斯过滤器似乎可以工作。如果是这样,这会是更好的方法吗?它会像“规范化”电影数据一样简单,为每个用户训练一个分类器,然后对每部电影进行分类吗?

如果你的建议包括一些大脑融化的概念(我在这些学科上没有经验,特别是在人工智能方面),如果你还列出了一些基础知识供我在深入研究实际内容之前进行研究,我将不胜感激。

谢谢!

I'm thinking of writing an app to classify movies in an HTPC based on what the family members like.

I don't know statistics or AI, but the stuff here looks very juicy. I wouldn't know where to start do.

Here's what I want to accomplish:

  1. Compose a set of samples from each users likes, rating each sample attribute separately. For example, maybe a user likes western movies a lot, so the western genre would carry a bit more weight for that user (and so on for other attributes, like actors, director, etc).

  2. A user can get suggestions based on the likes of the other users. For example, if both user A and B like Spielberg (connection between the users), and user B loves Batman Begins, but user A loathes Katie Holmes, weigh the movie for user A accordingly (again, each attribute separately, for example, maybe user A doesn't like action movies so much, so bring the rating down a bit, and since Katie Holmes isn't the main star, don't take that into account as much as the other attributes).

Basically, comparing sets from user A similar to sets from user B, and come up with a rating for user A.

I have a crude idea about how to implement this, but I'm certain some bright minds have already thought of a far better solution already, so... any suggestions?

Actually, after a quick research, it seems a Bayesian filter would work. If so, would this be the better approach? Would it be as simple as just "normalizing" movie data, training a classifier for each user, and then just classify each movie?

If your suggestion includes some brain melting concepts (I'm not experienced in these subjects, specially in AI), I'd appreciate it if you also included a list of some basics for me to research before diving into the meaty stuff.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

独守阴晴ぅ圆缺 2024-08-15 16:43:25

这类似于OP想要构建的这个问题推荐系统。简而言之,我们得到了一组训练数据,其中包括用户对电影的评分(例如 1-5 星级)和每部电影的一组属性(年份、类型、演员……)。我们想要构建一个推荐器,以便它能够为未看过的电影输出可能的评级。所以 inpt 数据看起来像:

user movie   year   genre   ...    | rating
---------------------------------------------
  1    1     2006   action         |    5
  3    2     2008   drama          |    3.5
  ...

对于一部未评级的电影 X:

10    20     2009   drama   ?

我们想要预测评级。对所有未看过的电影执行此操作,然后按预测的电影评分排序并输出前 10 名,即可为您提供一个推荐系统。

最简单的方法是使用 k-最近邻 算法。在评分的电影中,搜索与电影 X“最接近”的电影,并结合它们的评分来生成预测。
这种方法的优点是非常简单,易于从头开始实施。

还存在其他更复杂的方法。例如,您可以构建一个决策树,在训练数据上拟合一组规则。您还可以使用贝叶斯网络人工神经网络支持向量机,等等……对于没有适当背景的人来说,经历这些都不是一件容易的事。
我仍然希望您会使用外部工具/库。现在您似乎熟悉贝叶斯网络,因此一个简单的朴素贝叶斯网络实际上可能非常强大。优点之一是它允许在缺失数据下进行预测。

主要思想有些相同;获取您拥有的输入数据,训练模型,然后用它来预测新实例的类别。

如果您想在不需要编程的简单直观包中使用不同的算法,我建议您看看 Weka(我的第一选择)、Orange 或 < a href="http://rapid-i.com/" rel="nofollow noreferrer">RapidMiner。最困难的部分是将数据集准备为所需的格式。其余的就像选择什么算法并应用它一样简单(只需点击几下即可!)

我想对于不想了解太多细节的人,我建议使用最近邻方法,因为它直观且易于实现.. 使用 Weka (或其他工具之一)的选择仍然值得研究。

This is similar to this question where the OP wanted to build a recommendation system. In a nutshell, we are given a set of training data consisting of users ratings to movies (1-5 star rating for example) and a set of attributes for each movie (year, genre, actors, ..). We want to build a recommender so that it will output for unseen movies a possible rating. So the inpt data looks like:

user movie   year   genre   ...    | rating
---------------------------------------------
  1    1     2006   action         |    5
  3    2     2008   drama          |    3.5
  ...

and for an unrated movie X:

10    20     2009   drama   ?

we want to predict a rating. Doing this for all unseen movies then sorting by predicted movie rating and outputting the top 10 gives you a recommendation system.

The simplest approach is to use a k-nearest neighbor algorithm. Among the rated movies, search for the "closest" ones to movie X, and combine their ratings to produce a prediction.
This approach has the advantage of being very simple to easy implement from scratch.

Other more sophisticated approaches exist. For example you can build a decision tree, fit a set of rules on the training data. You can also use Bayesian networks, artificial neural networks, support vector machines, among many others... Going through each of these wont be easy for someone without the proper background.
Still I expect you would be using an external tool/library. Now you seem to be familiar with Bayesian Networks, so a simple naive bayes net, could in fact be very powerful. One advantage is that it allow for prediction under missing data.

The main idea would be somewhat the same; take the input data you have, train a model, then use it to predict the class of new instances.

If you want to play around with different algorithms in simple intuitive package which requires no programming, I suggest you take a look at Weka (my 1st choice), Orange, or RapidMiner. The most difficult part would be to prepare the dataset to the required format. The rest is as easy as choosing what algorithm and applying it (all in a few clicks!)

I guess for someone not looking to go into too much details, I would recommend going with the nearest neighbor method as it is intuitive and easy to implement.. Still the option of using Weka (or one of the other tools) is worth looking into.

风吹雪碎 2024-08-15 16:43:25

有一些算法对此很有用:

ARTMAP:通过彼此的概率进行分组(这不是很快,但在我看来,它对您的问题来说是最好的)

ARTMAP 拥有一组共同属性,并通过百分比确定相似性的可能性。
ARTMAP

KMeans:这根据向量之间的距离来分隔向量
KMeans:维基百科

PCA:将所有值的平均值与变化位分开。这就是您在计算机视觉中进行人脸检测和背景减除的方法。
PCA

There are a few algorithms that are good for this:

ARTMAP: groups via probability against each other (this isn't fast but its the best thing for your problem IMO)

ARTMAP holds a group of common attributes and determines likelyhood of simliarity via a percentages.
ARTMAP

KMeans: This seperates out the vectors by the distance that they are from each other
KMeans: Wikipedia

PCA: will seperate the average of all the values from the varing bits. This is what you would use to do face detection, and background subtraction in Computer Vision.
PCA

星星的軌跡 2024-08-15 16:43:25

K 最近邻算法可能正适合您。

The K-nearest neighbor algorithm may be right up your alley.

薔薇婲 2024-08-15 16:43:25

查看顶级团队的一些工作Netflix 奖

Check out some of the work of the top teams for the netflix prize.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文