洪水贝叶斯评级创建的值超出范围
我正在尝试应用 贝叶斯评分公式,但如果我评分为 1 5,000 或数百,最终评分大于 5。
例如,给定项目没有投票,在投票 170,000 次并获得 1 星后,其最终评分为 5.23。如果我给100分,它就是一个正常值。
这是我在 PHP 中的内容。
<?php
// these values came from DB
$total_votes = 2936; // total of votes for all items
$total_rating = 582.955; // sum of all ratings
$total_items = 202;
// now the specific item, it has no votes yet
$this_num_votes = 0;
$this_score = 0;
$this_rating = 0;
// simulating a lot of votes with 1 star
for ($i=0; $i < 170000; $i++) {
$rating_sent = 1; // the new rating, always 1
$total_votes++; // adding 1 to total
$total_rating = $total_rating+$rating_sent; // adding 1 to total
$avg_num_votes = ($total_votes/$total_items); // Average number of votes in all items
$avg_rating = ($total_rating/$total_items); // Average rating for all items
$this_num_votes = $this_num_votes+1; // Number of votes for this item
$this_score = $this_score+$rating_sent; // Sum of all votes for this item
$this_rating = $this_score/$this_num_votes; // Rating for this item
$bayesian_rating = ( ($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating) ) / ($avg_num_votes + $this_num_votes);
}
echo $bayesian_rating;
?>
即使我用 1 或 2 淹没:
$rating_sent = rand(1,2)
100,000 票后的最终评级超过 5。
我刚刚做了一个新的测试
$rating_sent = rand(1,5)
,在 100,000 票后我得到了一个完全超出范围的值 (10.53)。我知道在正常情况下,没有一个项目会获得 170,000 票,而所有其他项目都不会获得投票。但我想知道我的代码是否有问题,或者考虑到大量选票,这是否是贝叶斯公式的预期行为。
编辑
为了清楚起见,这里对某些变量有更好的解释。
$avg_num_votes // SUM(votes given to all items)/COUNT(all items)
$avg_rating // SUM(rating of all items)/COUNT(all items)
$this_num_votes // COUNT(votes given for this item)
$this_score // SUM(rating for this item)
$bayesian_rating // is the formula itself
公式为:( (avg_num_votes * avg_ rating) + (this_num_votes * this_ rating) ) / (avg_num_votes + this_num_votes)
。摘自此处
I'm trying to apply the Bayesian rating formula, but if I rate 1 out of 5 thousand of hundreds, the final rating is greater than 5.
For example, a given item has no votes and after voting 170,000 times with 1 star, its final rating is 5.23. If I rate 100, it has a normal value.
Here is what I have in PHP.
<?php
// these values came from DB
$total_votes = 2936; // total of votes for all items
$total_rating = 582.955; // sum of all ratings
$total_items = 202;
// now the specific item, it has no votes yet
$this_num_votes = 0;
$this_score = 0;
$this_rating = 0;
// simulating a lot of votes with 1 star
for ($i=0; $i < 170000; $i++) {
$rating_sent = 1; // the new rating, always 1
$total_votes++; // adding 1 to total
$total_rating = $total_rating+$rating_sent; // adding 1 to total
$avg_num_votes = ($total_votes/$total_items); // Average number of votes in all items
$avg_rating = ($total_rating/$total_items); // Average rating for all items
$this_num_votes = $this_num_votes+1; // Number of votes for this item
$this_score = $this_score+$rating_sent; // Sum of all votes for this item
$this_rating = $this_score/$this_num_votes; // Rating for this item
$bayesian_rating = ( ($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating) ) / ($avg_num_votes + $this_num_votes);
}
echo $bayesian_rating;
?>
Even if I flood with 1 or 2:
$rating_sent = rand(1,2)
The final rating after 100,000 votes is over 5.
I just did a new test using
$rating_sent = rand(1,5)
And after 100,000 I got a value completely out of range range (10.53). I know that in a normal situation no item will get 170,000 votes while all the other items get no vote. But I wonder if there is something wrong with my code or if this is an expected behavior of Bayesian formula considering the massive votes.
Edit
Just to make it clear, here is a better explanation for some variables.
$avg_num_votes // SUM(votes given to all items)/COUNT(all items)
$avg_rating // SUM(rating of all items)/COUNT(all items)
$this_num_votes // COUNT(votes given for this item)
$this_score // SUM(rating for this item)
$bayesian_rating // is the formula itself
The formula is: ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)
. Taken from here
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
计算 avg_ rating 时,您需要除以total_votes 而不是total_items。
我做了一些改变,并得到了一些表现更好的东西。
http://codepad.org/gSdrUhZ2
You need to divide by total_votes rather than total_items when calculating avg_rating.
I made the changes and got something that behaves much better here.
http://codepad.org/gSdrUhZ2