按 5 星评级排序的更好方法是什么?

发布于 2024-08-04 22:46:55 字数 221 浏览 5 评论 0原文

我正在尝试使用 5 星级系统按客户评级对一堆产品进行排序。我设置的网站没有太多评级,并且会继续添加新产品,因此通常会有一些评级较低的产品。

我尝试使用平均星级,但当评级数量较少时,该算法会失败。

例如,具有 3x 5 星级评级的产品会比具有 100x 5 星级评级和 2x 2 星级评级的产品显示得更好。

第二个产品不应该显示得更高吗,因为它在统计上由于评级数量较多而更值得信赖?

I'm trying to sort a bunch of products by customer ratings using a 5 star system. The site I'm setting this up for does not have a lot of ratings and continue to add new products so it will usually have a few products with a low number of ratings.

I tried using average star rating but that algorithm fails when there is a small number of ratings.

Example a product that has 3x 5 star ratings would show up better than a product that has 100x 5 star ratings and 2x 2 star ratings.

Shouldn't the second product show up higher because it is statistically more trustworthy because of the larger number of ratings?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

错々过的事 2024-08-11 22:46:55

2015 年之前,互联网电影数据库 (IMDb) 公开列出了用于排名的公式 前 250 部 电影列表。引用:

计算最高评分 250 部作品的公式给出了真实的贝叶斯估计

加权评分 (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

地点:

  • R = 电影的平均值(平均值)
  • v = 电影的票数
  • m = 进入前 250 名所需的最低票数(目前为 25000 票)
  • C = 整个报告的平均投票数(当前为 7.0)

对于前 250 名,仅考虑普通选民的投票。

这并不难理解。公式为:

rating = (v / (v + m)) * R +
         (m / (v + m)) * C;

可以在数学上简化为:

rating = (R * v + C * m) / (v + m);

变量为:

  • R – 项目自身的评级。 R 是该项目得票的平均值。 (例如,如果某个项目没有投票,则其 R 为 0。如果有人给它 5 颗星,则 R 变为 5。如果其他人给它 1 颗星,R 变为 3,[1, 5] 的平均值。)
  • C – 平均项目的评级。找出数据库中每一项(包括当前项)的 R,并取平均值; (假设数据库中有 4 个项目,它们的评分为 [2, 3, 5, 5]。C 为 3.75,这些数字的平均值。)
  • v – 数字对某个项目的投票数。 (再举个例子,如果有 5 个人对某个项目投票,则 v 为 5。)
  • m – 可调整参数。应用于评级的“平滑”量基于与 m 相关的票数 (v)。调整 m 直到结果令您满意为止。并且不要将 IMDb 对 m 的描述误解为“列出所需的最低票数”——该系统完全能够对票数少于 m 的项目进行排名。

公式所做的就是:在计算平均值之前添加 m 个假想的选票,每个选票的值为 C。一开始,当没有足够的数据时(即投票数大大小于 m),这会导致用平均数据填充空白。然而,随着选票的积累,最终假想的选票将被真实的选票淹没。

在这个系统中,投票不会导致评级大幅波动。相反,他们只是在某个方向上稍微扰乱它。

当票数为零时,仅存在虚票,且全部为 C。因此,每个项目都以 C 评级开头。

另请参阅:

  • A 演示。单击“解决”。
  • IMDb 系统的另一个说明
  • 类似贝叶斯星级评级系统的解释

Prior to 2015, the Internet Movie Database (IMDb) publicly listed the formula used to rank their Top 250 movies list. To quote:

The formula for calculating the Top Rated 250 Titles gives a true Bayesian estimate:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where:

  • R = average for the movie (mean)
  • v = number of votes for the movie
  • m = minimum votes required to be listed in the Top 250 (currently 25000)
  • C = the mean vote across the whole report (currently 7.0)

For the Top 250, only votes from regular voters are considered.

It's not so hard to understand. The formula is:

rating = (v / (v + m)) * R +
         (m / (v + m)) * C;

Which can be mathematically simplified to:

rating = (R * v + C * m) / (v + m);

The variables are:

  • R – The item's own rating. R is the average of the item's votes. (For example, if an item has no votes, its R is 0. If someone gives it 5 stars, R becomes 5. If someone else gives it 1 star, R becomes 3, the average of [1, 5]. And so on.)
  • C – The average item's rating. Find the R of every single item in the database, including the current one, and take the average of them; that is C. (Suppose there are 4 items in the database, and their ratings are [2, 3, 5, 5]. C is 3.75, the average of those numbers.)
  • v – The number of votes for an item. (To given another example, if 5 people have cast votes on an item, v is 5.)
  • m – The tuneable parameter. The amount of "smoothing" applied to the rating is based on the number of votes (v) in relation to m. Adjust m until the results satisfy you. And don't misinterpret IMDb's description of m as "minimum votes required to be listed" – this system is perfectly capable of ranking items with less votes than m.

All the formula does is: add m imaginary votes, each with a value of C, before calculating the average. In the beginning, when there isn't enough data (i.e. the number of votes is dramatically less than m), this causes the blanks to be filled in with average data. However, as votes accumulates, eventually the imaginary votes will be drowned out by real ones.

In this system, votes don't cause the rating to fluctuate wildly. Instead, they merely perturb it a bit in some direction.

When there are zero votes, only imaginary votes exist, and all of them are C. Thus, each item begins with a rating of C.

See also:

秉烛思 2024-08-11 22:46:55

Evan Miller 展示了贝叶斯方法对 5 星级评级进行排名:
输入图像描述这里

其中

  • nkk 星级的数量,
  • skk 颗星,
  • N 为总票数
  • K 为最大星数(例如 K=5,为 5 星评级) system)
  • z_alpha/2 是正态分布的 1 - alpha/2 分位数。如果您希望实际排序标准至少与计算的排序标准一样大,置信度为 95%(基于贝叶斯后验分布),请选择 z_alpha/2 = 1.65。

在Python中,排序标准可以计算为

def starsort(ns):
    """
    http://www.evanmiller.org/ranking-items-with-star-ratings.html
    """
    N = sum(ns)
    K = len(ns)
    s = list(range(K,0,-1))
    s2 = [sk**2 for sk in s]
    z = 1.65
    def f(s, ns):
        N = sum(ns)
        K = len(ns)
        return sum(sk*(nk+1) for sk, nk in zip(s,ns)) / (N+K)
    fsns = f(s, ns)
    return fsns - z*math.sqrt((f(s2, ns)- fsns**2)/(N+K+1))

例如,如果一个项目有60个五星级,80个四星级,75个三星级,20个二星级和25个一星级,那么它的总星级约为3.4:

x = (60, 80, 75, 20, 25)
starsort(x)
# 3.3686975120774694

您可以对 5 星级评级列表进行排序。

sorted([(60, 80, 75, 20, 25), (10,0,0,0,0), (5,0,0,0,0)], key=starsort, reverse=True)
# [(10, 0, 0, 0, 0), (60, 80, 75, 20, 25), (5, 0, 0, 0, 0)]

这显示了更多评级对整体星级值的影响。


你会发现这个公式往往给出的总体评分有点
低于亚马逊、Ebay 或沃尔玛等网站报告的总体评级
特别是当票数很少时(例如,少于 300 票)。这反映了
选票越少,不确定性就越高。随着票数的增加
(数以千计)所有这些评级公式总体上应该倾向于
(加权)平均评级。


由于该公式仅取决于 5 星级评分的频率分布
对于商品本身,很容易合并来自多个来源的评论(或者,
更新根据新投票的总体评分),只需添加频率
一起分配。


与IMDb公式不同,该公式不依赖于平均分数
在所有项目中,也没有人为的最低票数截止值。

此外,这个公式利用了完整的频率分布——而不仅仅是
平均星数和票数。这是有道理的
应该,因为具有 10 个 5 星和 10 个 1 星的项目应被视为
比具有以下特征的项目具有更多的不确定性(因此评级不高)
二十个三星级评级:

In [78]: starsort((10,0,0,0,10))
Out[78]: 2.386028063783418

In [79]: starsort((0,0,20,0,0))
Out[79]: 2.795342687927806

IMDb 公式没有考虑到这一点。

Evan Miller shows a Bayesian approach to ranking 5-star ratings:
enter image description here

where

  • nk is the number of k-star ratings,
  • sk is the "worth" (in points) of k stars,
  • N is the total number of votes
  • K is the maximum number of stars (e.g. K=5, in a 5-star rating system)
  • z_alpha/2 is the 1 - alpha/2 quantile of a normal distribution. If you want 95% confidence (based on the Bayesian posterior distribution) that the actual sort criterion is at least as big as the computed sort criterion, choose z_alpha/2 = 1.65.

In Python, the sorting criterion can be calculated with

def starsort(ns):
    """
    http://www.evanmiller.org/ranking-items-with-star-ratings.html
    """
    N = sum(ns)
    K = len(ns)
    s = list(range(K,0,-1))
    s2 = [sk**2 for sk in s]
    z = 1.65
    def f(s, ns):
        N = sum(ns)
        K = len(ns)
        return sum(sk*(nk+1) for sk, nk in zip(s,ns)) / (N+K)
    fsns = f(s, ns)
    return fsns - z*math.sqrt((f(s2, ns)- fsns**2)/(N+K+1))

For example, if an item has 60 five-stars, 80 four-stars, 75 three-stars, 20 two-stars and 25 one-stars, then its overall star rating would be about 3.4:

x = (60, 80, 75, 20, 25)
starsort(x)
# 3.3686975120774694

and you can sort a list of 5-star ratings with

sorted([(60, 80, 75, 20, 25), (10,0,0,0,0), (5,0,0,0,0)], key=starsort, reverse=True)
# [(10, 0, 0, 0, 0), (60, 80, 75, 20, 25), (5, 0, 0, 0, 0)]

This shows the effect that more ratings can have upon the overall star value.


You'll find that this formula tends to give an overall rating which is a bit
lower than the overall rating reported by sites such as Amazon, Ebay or Wal-mart
particularly when there are few votes (say, less than 300). This reflects the
higher uncertainy that comes with fewer votes. As the number of votes increases
(into the thousands) all overall these rating formulas should tend to the
(weighted) average rating.


Since the formula only depends on the frequency distribution of 5-star ratings
for the item itself, it is easy to combine reviews from multiple sources (or,
update the overall rating in light of new votes) by simply adding the frequency
distributions together.


Unlike the IMDb formula, this formula does not depend on the average score
across all items, nor an artificial minimum number of votes cutoff value.

Moreover, this formula makes use of the full frequency distribution -- not just
the average number of stars and the number of votes. And it makes sense that it
should since an item with ten 5-stars and ten 1-stars should be treated as
having more uncertainty than (and therefore not rated as highly as) an item with
twenty 3-star ratings:

In [78]: starsort((10,0,0,0,10))
Out[78]: 2.386028063783418

In [79]: starsort((0,0,20,0,0))
Out[79]: 2.795342687927806

The IMDb formula does not take this into account.

无声静候 2024-08-11 22:46:55

请参阅此页面,了解对星级评级的详细分析系统,以及这个基于赞成票/反对票的系统的分析。

对于向上和向下投票,您想要估计的概率是,给定您的评分,“真实”分数(如果您有无限的评分)大于某个数量(例如,您的其他某些项目的类似数字)重新排序)。

请参阅第二篇文章的答案,但结论是您要使用威尔逊置信度。本文给出了方程式和示例 Ruby 代码(很容易翻译成另一种语言)。

See this page for a good analysis of star-based rating systems, and this one for a good analysis of upvote-/downvote- based systems.

For up and down voting you want to estimate the probability that, given the ratings you have, the "real" score (if you had infinite ratings) is greater than some quantity (like, say, the similar number for some other item you're sorting against).

See the second article for the answer, but the conclusion is you want to use the Wilson confidence. The article gives the equation and sample Ruby code (easily translated to another language).

oО清风挽发oО 2024-08-11 22:46:55

好吧,根据您想要使其复杂程度,您可以根据此人做出的评分数量以及这些评分是什么来对评分进行加权。如果此人仅进行了一次评级,则可能是欺诈评级,并且可能较少。或者,如果该人在 a 类中对很多事物进行了评分,但在 b 类中评分很少,并且平均评分为 1.3 星(满分 5 星),那么听起来 a 类可能会因该用户的平均得分较低而被人为压低,并且应该调整。

但已经足够让事情变得复杂了。让我们简单一点。

假设我们仅使用两个值(ReviewCount 和 AverageRating)来处理特定项目,那么对我来说,将 ReviewCount 视为本质上的“可靠性”值是有意义的。但我们不只是想降低 ReviewCount 较低的项目的分数:单个一星评级可能与单个 5 星评级一样不可靠。所以我们想要做的可能是中间的平均值:3。

所以,基本上,我正在考虑一个类似于 X * AverageRating + Y * 3 = the- rating-we-want 的方程式。为了使这个值正确,我们需要 X+Y 等于 1。此外,我们还需要随着 ReviewCount 的增加而增加 X 的值...当评论计数为 0 时,x 应该为 0(给我们一个方程“ 3”),并且对于无限评论计数,X 应为 1(这使得方程 = AverageRating)。

那么X和Y方程是什么?对于 X 方程,希望当自变量接近无穷大时,因变量渐近接近 1。一组好的方程类似于:
Y = 1/(因子^评级计数)
和(利用 X 必须等于 1-Y 的事实)
X = 1 – (1/(factor^RatingCount)

然后我们可以调整“factor”以适应我们正在寻找的范围。

我使用这个简单的 C# 程序来尝试几个因素:

        // We can adjust this factor to adjust our curve.
        double factor = 1.5;  

        // Here's some sample data
        double RatingAverage1 = 5;
        double RatingCount1 = 1;

        double RatingAverage2 = 4.5;
        double RatingCount2 = 5;

        double RatingAverage3 = 3.5;
        double RatingCount3 = 50000; // 50000 is not infinite, but it's probably plenty to closely simulate it.

        // Do the calculations
        double modfactor = Math.Pow(factor, RatingCount1);
        double modRating1 = (3 / modfactor)
            + (RatingAverage1 * (1 - 1 / modfactor));

        double modfactor2 = Math.Pow(factor, RatingCount2);
        double modRating2 = (3 / modfactor2)
            + (RatingAverage2 * (1 - 1 / modfactor2));

        double modfactor3 = Math.Pow(factor, RatingCount3);
        double modRating3 = (3 / modfactor3)
            + (RatingAverage3 * (1 - 1 / modfactor3));

        Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}", 
            RatingAverage1, RatingCount1, modRating1));
        Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
            RatingAverage2, RatingCount2, modRating2));
        Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
            RatingAverage3, RatingCount3, modRating3));

        // Hold up for the user to read the data.
        Console.ReadLine();

所以你不必费心复制它中,它给出了这样的输出:

RatingAverage: 5, RatingCount: 1, Adjusted Rating: 3.67
RatingAverage: 4.5, RatingCount: 5, Adjusted Rating: 4.30
RatingAverage: 3.5, RatingCount: 50000, Adjusted Rating: 3.50

类似的东西?您显然可以根据需要调整“因子”值以获得您想要的权重。

Well, depending on how complex you want to make it, you could have ratings additionally be weighted based on how many ratings the person has made, and what those ratings are. If the person has only made one rating, it could be a shill rating, and might count for less. Or if the person has rated many things in category a, but few in category b, and has an average rating of 1.3 out of 5 stars, it sounds like category a may be artificially weighed down by the low average score of this user, and should be adjusted.

But enough of making it complex. Let’s make it simple.

Assuming we’re working with just two values, ReviewCount and AverageRating, for a particular item, it would make sense to me to look ReviewCount as essentially being the “reliability” value. But we don’t just want to bring scores down for low ReviewCount items: a single one-star rating is probably as unreliable as a single 5 star rating. So what we want to do is probably average towards the middle: 3.

So, basically, I’m thinking of an equation something like X * AverageRating + Y * 3 = the-rating-we-want. In order to make this value come out right we need X+Y to equal 1. Also we need X to increase in value as ReviewCount increases...with a review count of 0, x should be 0 (giving us an equation of “3”), and with an infinite review count X should be 1 (which makes the equation = AverageRating).

So what are X and Y equations? For the X equation want the dependent variable to asymptotically approach 1 as the independent variable approaches infinity. A good set of equations is something like:
Y = 1/(factor^RatingCount)
and (utilizing the fact that X must be equal to 1-Y)
X = 1 – (1/(factor^RatingCount)

Then we can adjust "factor" to fit the range that we're looking for.

I used this simple C# program to try a few factors:

        // We can adjust this factor to adjust our curve.
        double factor = 1.5;  

        // Here's some sample data
        double RatingAverage1 = 5;
        double RatingCount1 = 1;

        double RatingAverage2 = 4.5;
        double RatingCount2 = 5;

        double RatingAverage3 = 3.5;
        double RatingCount3 = 50000; // 50000 is not infinite, but it's probably plenty to closely simulate it.

        // Do the calculations
        double modfactor = Math.Pow(factor, RatingCount1);
        double modRating1 = (3 / modfactor)
            + (RatingAverage1 * (1 - 1 / modfactor));

        double modfactor2 = Math.Pow(factor, RatingCount2);
        double modRating2 = (3 / modfactor2)
            + (RatingAverage2 * (1 - 1 / modfactor2));

        double modfactor3 = Math.Pow(factor, RatingCount3);
        double modRating3 = (3 / modfactor3)
            + (RatingAverage3 * (1 - 1 / modfactor3));

        Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}", 
            RatingAverage1, RatingCount1, modRating1));
        Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
            RatingAverage2, RatingCount2, modRating2));
        Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
            RatingAverage3, RatingCount3, modRating3));

        // Hold up for the user to read the data.
        Console.ReadLine();

So you don’t bother copying it in, it gives this output:

RatingAverage: 5, RatingCount: 1, Adjusted Rating: 3.67
RatingAverage: 4.5, RatingCount: 5, Adjusted Rating: 4.30
RatingAverage: 3.5, RatingCount: 50000, Adjusted Rating: 3.50

Something like that? You could obviously adjust the "factor" value as needed to get the kind of weighting you want.

秉烛思 2024-08-11 22:46:55

您可以按中位数而不是算术平均值排序。在本例中,两个示例的中位数均为 5,因此两者在排序算法中具有相同的权重。

您可以使用 模式 达到相同的效果,但中位数可能是更好的主意。

如果您想为具有 100 个 5 星评级的产品分配额外的权重,您可能需要采用某种加权模式,为具有相同中位数但总体投票数更多的评级分配更多权重。

You could sort by median instead of arithmetic mean. In this case both examples have a median of 5, so both would have the same weight in a sorting algorithm.

You could use a mode to the same effect, but median is probably a better idea.

If you want to assign additional weight to the product with 100 5-star ratings, you'll probably want to go with some kind of weighted mode, assigning more weight to ratings with the same median, but with more overall votes.

难忘№最初的完美 2024-08-11 22:46:55

如果您只需要一个快速且便宜的解决方案,并且无需使用大量计算即可完成大部分工作,这里有一个选择(假设评分范围为 1-5)

SELECT Products.id, Products.title, avg(Ratings.score), etc
FROM
Products INNER JOIN Ratings ON Products.id=Ratings.product_id
GROUP BY 
Products.id, Products.title
ORDER BY (SUM(Ratings.score)+25.0)/(COUNT(Ratings.id)+20.0) DESC, COUNT(Ratings.id) DESC

通过添加 25 并除以总评分 + 20,您基本上会添加 10 个最差的评分得分和 10 个最好得分占总评分,然后进行相应排序。

这确实存在已知问题。例如,它不公平地奖励评分很少的低分产品(如 此图展示了平均得分为 1 且只有一个评级得分的产品1.2,而平均得分为 1 和 1k+ 评级的产品得分接近 1.05)。你也可以说它不公平地惩罚了评分很少的高质量产品。

此图表显示了 1-1000 评分范围内的所有 5 个评分会发生什么情况:
http://www.wolframalpha.com/input/?i=Plot3D%5B%2825%2Bxy%29/%2820%2Bx%29%2C%7Bx %2C1%2C1000%7D%2C%7By%2C0%2C6%7D%5D

您可以看到最底层的评分有所上升,但总的来说,我认为这是一个公平的排名。您也可以这样查看:

http://www.wolframalpha.com/input/?i=Plot3D%5B6-%28 %2825%2Bxy%29/%2820%2Bx%29%29%2C%7Bx%2C1%2C1000%7D%2C%7By%2C0%2C6%7D%5D

如果你在大多数地方掉落弹珠在此图表中,它将自动滚动到具有更高分数和更高评级的产品。

If you just need a fast and cheap solution that will mostly work without using a lot of computation here's one option (assuming a 1-5 rating scale)

SELECT Products.id, Products.title, avg(Ratings.score), etc
FROM
Products INNER JOIN Ratings ON Products.id=Ratings.product_id
GROUP BY 
Products.id, Products.title
ORDER BY (SUM(Ratings.score)+25.0)/(COUNT(Ratings.id)+20.0) DESC, COUNT(Ratings.id) DESC

By adding in 25 and dividing by the total ratings + 20 you're basically adding 10 worst scores and 10 best scores to the total ratings and then sorting accordingly.

This does have known issues. For example, it unfairly rewards low-scoring products with few ratings (as this graph demonstrates, products with an average score of 1 and just one rating score a 1.2 while products with an average score of 1 and 1k+ ratings score closer to 1.05). You could also argue it unfairly punishes high-quality products with few ratings.

This chart shows what happens for all 5 ratings over 1-1000 ratings:
http://www.wolframalpha.com/input/?i=Plot3D%5B%2825%2Bxy%29/%2820%2Bx%29%2C%7Bx%2C1%2C1000%7D%2C%7By%2C0%2C6%7D%5D

You can see the dip upwards at the very bottom ratings, but overall it's a fair ranking, I think. You can also look at it this way:

http://www.wolframalpha.com/input/?i=Plot3D%5B6-%28%2825%2Bxy%29/%2820%2Bx%29%29%2C%7Bx%2C1%2C1000%7D%2C%7By%2C0%2C6%7D%5D

If you drop a marble on most places in this graph, it will automatically roll towards products with both higher scores and higher ratings.

紫罗兰の梦幻 2024-08-11 22:46:55

显然,收视率低使这个问题在统计上受到限制。尽管如此……

提高总体评级质量的一个关键因素是“对评级者进行评级”,即密切关注每个特定“评级者”提供的评级(相对于其他评级者)。这允许在聚合过程中权衡他们的投票。

另一种解决方案(更多的是应对方案)是向最终用户提供基础项目的投票计数(或其范围指示)。

Obviously, the low number of ratings puts this problem at a statistical handicap. Never the less...

A key element to improving the quality of an aggregate rating is to "rate the rater", i.e. to keep tabs of the ratings each particular "rater" has supplied (relative to others). This allows weighing their votes during the aggregation process.

Another solution, more of a cope out, is to supply the end-users with a count (or a range indication thereof) of votes for the underlying item.

静若繁花 2024-08-11 22:46:55

一种选择类似于 Microsoft 的 TrueSkill 系统,其中分数由 mean - 3*stddev 给出,其中常量可以调整。

One option is something like Microsoft's TrueSkill system, where the score is given by mean - 3*stddev, where the constants can be tweaked.

蓝眼泪 2024-08-11 22:46:55

经过一番考虑,我选择了贝叶斯系统。
如果有人使用 Ruby,这里有一个 gem:

https://github.com/wbotelhos/ rating

After look for a while, I choose the Bayesian system.
If someone is using Ruby, here a gem for it:

https://github.com/wbotelhos/rating

柠檬色的秋千 2024-08-11 22:46:55

这是一个 Go 版本,供任何搜索的人使用。

// starsort.go

package main

import(
    "fmt"
    "math"
)

//  http://www.evanmiller.org/ranking-items-with-star-ratings.html

func sum(ns []int) int {
    var total int
    for _,n := range ns {
        total += n
    }
    return total
}

func f(s []int, ns []int)float64{
    N := sum(ns)
    K := len(ns)
    ks := make([]int,K)
    for i:=0; i<5; i++ {
        ks[i] = s[i] * (ns[i]+1)
    }
    return float64(sum(ks)) / float64(N+K)
}

func starSort(ns []int) float64 {
    N := sum(ns)
    K := len(ns)
    s := []int{5, 4, 3, 2, 1}
    s2 := []int{25, 16, 9, 4, 1}
    z := float64(1.65)
    fsns := f(s, ns)
    return fsns - z * math.Sqrt(((f(s2, ns) - (fsns*fsns)) / float64(N+K+1)))
}

func main(){
    fmt.Println(starSort([]int{60, 80, 75, 20, 25}))
}

Here's a Go version, for anyone searching.

// starsort.go

package main

import(
    "fmt"
    "math"
)

//  http://www.evanmiller.org/ranking-items-with-star-ratings.html

func sum(ns []int) int {
    var total int
    for _,n := range ns {
        total += n
    }
    return total
}

func f(s []int, ns []int)float64{
    N := sum(ns)
    K := len(ns)
    ks := make([]int,K)
    for i:=0; i<5; i++ {
        ks[i] = s[i] * (ns[i]+1)
    }
    return float64(sum(ks)) / float64(N+K)
}

func starSort(ns []int) float64 {
    N := sum(ns)
    K := len(ns)
    s := []int{5, 4, 3, 2, 1}
    s2 := []int{25, 16, 9, 4, 1}
    z := float64(1.65)
    fsns := f(s, ns)
    return fsns - z * math.Sqrt(((f(s2, ns) - (fsns*fsns)) / float64(N+K+1)))
}

func main(){
    fmt.Println(starSort([]int{60, 80, 75, 20, 25}))
}
禾厶谷欠 2024-08-11 22:46:55

我强烈推荐 Toby Segaran (OReilly) 的《Programming Collective Intelligence》一书 ISBN 978-0-596-52932-1,该书讨论了如何从人群行为中提取有意义的数据。这些示例是用 Python 编写的,但转换起来很容易。

I'd highly recommend the book Programming Collective Intelligence by Toby Segaran (OReilly) ISBN 978-0-596-52932-1 which discusses how to extract meaningful data from crowd behaviour. The examples are in Python, but its easy enough to convert.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文