通过评分和观看次数确定视频的受欢迎程度
我即将开始一个新项目——视频网站。用户将能够注册并通过点击“喜欢”或“不喜欢”或类似的操作对视频进行投票。无论如何,这将是一个 2 选项投票系统,而不是一个 5 星级系统。
每隔 X 天,我就会生成一个最受欢迎视频的“图表”。所以我的问题是:我应该如何确定给定视频的受欢迎程度?
如果我按照观看次数最多的视频进行统计,这可能会导致非常糟糕的视频进入排行榜(只是因为它们太糟糕了)。
如果我采用基于“喜欢”和“不喜欢”投票数量的评分系统(例如,100 个喜欢投票,50 个不喜欢投票等于 2 分),观看次数很少的视频可能会出现在顶部图表。
所以,我需要做的就是将两者结合起来。当然,除非垃圾观点和投票。
你们对这个话题有什么想法?
编辑:删除了以下标签:[mysql] [postgresql],为其他更具代表性的标签腾出空间;预期实现中使用的 SQL 技术似乎与评级模型本身的考虑因素没有太大关系。
I am about to embark on a new project - a video website. Users will be able to register, and vote on videos by clicking "like" or "dislike", or something to that effect. In any event, it will be a 2-option voting system, not a 5-star system.
Every X number of days, I will be generating a "chart" of the most popular videos. So my question is: how should I determine the popularity of a given video?
If I went the route of tallying up the videos with the most views, this could have the effect of exceptionally bad videos making it to the of the charts (just because they're so bad).
If I go the route of a scoring system based on the amount of "like" and "dislike" votes (eg. 100 like votes, and 50 dislike votes equals a score of 2), videos with few views could appear on the top of the charts.
So, what I need to do is a combination of the two. Barring, of course, spammy views and votes.
What's your guys' thoughts on the subject?
Edit: the following tags were removed: [mysql] [postgresql], to make room for other, more representative tags; the SQL technology used in the intended implementation does not seem to bear much on the considerations regarding the rating model per-se.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您似乎忽略了一点:即使在相对同质的“选民”群体的背景下,对电影的喜欢和厌恶也绝不是客观的。想想术语“Chix Flix”或名为“NetFlix”的成功故事如何说明这种主观性......
但是,如果您坚持实施您建议的模型时,需要承认并可能在评级公式中考虑到几个隐藏变量和系统动态。
即当有人查看电影页面但不投票时,无论哪种方式。
处理这种额外价值的问题在于它的模糊性:人们不投票是因为他们没有看过这部电影,还是因为他们既不真正喜欢也不喜欢它?很可能两者兼而有之,因此我们可以/应该在公式中使用“无投票页面浏览量”的计数,以(在某种程度上)提高不会产生强烈(正面或负面)情绪的电影的评级(以免“两极分化”的电影会显得更加臭名昭著或更受欢迎)
超过某个阈值,特别是如果评级和/或投票计数在页面视图之前可见,则评级和投票计数可以影响人们决定投票(无论哪种方式)或什至决定放弃投票的方式。这意味着总投票数和/或观看次数与有效评级并不线性相关。
一般来说,投票率(例如“喜欢”/“总数”或“喜欢”/“不喜欢”等)表示电影的“质量”(注意质量周围的引号......),即投票数(和观点)表明电影的恶名程度(“知名度”等)。
非常小的投票和/或观看次数需要谨慎处理,因为它们会给评级带来很大的波动性。换句话说,小样本导致的评级不具有静态代表性。
冒着使模型复杂化的风险,请考虑保留投票/观看发生时间的[一些]记录,以允许识别集合中的“热门”(和“冷门”)电影。该信息可以告知评级逻辑,但也可以用于引导用户选择当前热门商品。顺便说一句,因此增加了提到的跟风效应:-(而且还增加了投票样本量:-)。
所有这些考虑因素都表明在实施该评级系统时要谨慎。它还暗示可能需要将完整电影集的统计数据纳入单部电影的评级公式。换句话说,不要仅仅根据电影本身的投票/观看次数来评价给定的电影,还要根据一个动作收到的平均投票数、电影页面获得的最大观看次数等来评价。事实上,这是一个迭代过程,首先对电影进行[粗略]排名,然后通过使用类似评级的电影组的统计数据重新计算排名可能会提供更好的系统(假设公式是“公平的”并且以某种方式收敛)
You seem to be missing the point that likes and dislikes in movies are anything but objective even within the context of a relatively homogeneous group of "voters". Think how the term "Chix Flix" or the success story called "NetFlix", illustrate this subjectivity...
Yet, if you persist in implementing the model you suggest, there are several hidden variables and system dynamics that need to be acknowledged and possibly taken into account in the rating's formula.
i.e. when someone views the movie page and yet doesn't vote, either way.
The problem of dealing with this extra value is its ambiguity: do people not vote because they didn't see the movie or because they neither truly like nor disliked it? Very likely a bit of both, therefore we can/should use the count of the "Page views without vote" in the formula, to boost (somewhat) the rating of movies that do not generate a strong (positive or negative) sentiment (lest the "polarizing" movies will appear more notorious or popular)
Past a certain threshold, and particularly if the rating and/or vote counts is visible before the page view, the rating and vote counts can influence the way people decide to vote (either way) or even decide to abstain from voting. The implication is that the total vote and/or view counts do not relate linearly to the effective rating.
Vote ratios in general (eg "likes" / "total" or "likes"/"dislikes" etc.) are indicative of the "quality" of a movie (note the quotes around quality...), whereby the number of votes (and of views) is indicative of the notoriety ("name recognition" etc.) of a movie.
Very small vote and/or view counts are to be handled carefully because they introduce much volatility in the rating. Phrased otherwise, small samples make for not so statically representative ratings.
At the risk of complicating the model, consider keeping [some] record of when votes/view happened, to allow identifying "hot" (and "cooling") movies in the collection. This info may inform the rating logic, but also may be used to direct the users towards currently hot items. BTW, hence feeding the bandwagon effect mentioned :-( but also, increasing the voting sample size :-).
All these considerations suggest caution in implementing this rating system. It also hints at the likely need of including statistics about the complete set of movies into the rating formula for an individual movie. In other words, do not rate a given movie solely on the basis of the its own vote/view counts but also on say the average vote counts a move receives, the maximum view a movie page gets etc. In fact, an iterative process, whereby movies are [roughly] ranked at first and then the ranking is recalculated by using the statistics of groups of movies similarly rated may provide a better system (provided the formulas are "fair" and somehow converge)
一个标准的技巧是从一个中立的基线开始:比如说 10 个喜欢和 10 个不喜欢,得分为 1。前几票不会改变比例太多,但随着投票的积累,基线就会被压倒。基线值的准确选择将影响新电影的评分(两个值不必相等),以及需要多少票才能大幅改变评分。
A standard trick is to start with a neutral baseline: say 10 likes and 10 dislikes that gives a score of 1. The first few votes don't change the ratio too much, but as votes accumulate, the baseline is overwhelmed. The exact choice of the baseline values will influence the rating of a new movie (the two values don't have to be equal), and how many votes are needed to change the rating substantially.