蛋糕比较算法

发布于 2024-07-30 01:49:04 字数 816 浏览 5 评论 0 原文

这实际上是在比较蛋糕。 我的朋友正在举办一个纸杯蛋糕派对,目的是确定曼哈顿最好的纸杯蛋糕店。 事实上,它的野心远不止于此。 请继续阅读。

有 27 家面包店,有 19 人参加(可能有一两个人缺席)。 每家面包店都会推出 4 个纸杯蛋糕,如果可能的话,包括主食——香草、巧克力和红丝绒——并用百搭口味来完善这 4 个纸杯蛋糕。 可以根据 4 个属性对纸杯蛋糕进行评分:味道、湿度、外观(漂亮)和总体品质。 人们将为他们品尝的每个纸杯蛋糕的每个属性提供 5 分制的评分。 最后,每个纸杯蛋糕可以切成4或5块。

问题是:针对每种属性和每种口味(将“通配符”视为一种口味)对面包店进行统计上有意义的排名的程序是什么? 具体来说,我们想要对面包店进行 8 次排名:对于每种口味,我们希望根据品质对面包店进行排名(品质是属性之一),对于每个属性,我们希望对所有口味的面包店进行排名(即,与口味无关) ,即聚合所有口味)。 大奖授予了善良属性排名最高的面包店。

当然,概括这一点是有好处的。

这将在大约 12 小时内发生,因此如果在此期间没有人回答,我将发布我们最终所做的事情作为答案。

PS:这是关于它的派对后博客文章:http://gracenotesnyc.com/2009/08/05/gracenotes-nycs-cupcake-cagematch-the-sweetest-battle-ever/

This is literally about comparing cakes. My friend is having a cupcake party with the goal of determining the best cupcakery in Manhattan. Actually, it's much more ambitious than that. Read on.

There are 27 bakeries, and 19 people attending (with maybe one or two no-shows). There will be 4 cupcakes from each bakery, if possible including the staples -- vanilla, chocolate, and red velvet -- and rounding out the 4 with wildcard flavors. There are 4 attributes on which to rate the cupcakes: flavor, moistness, presentation (prettiness), and general goodness. People will provide ratings on a 5-point scale for each attribute for each cupcake they sample. Finally, each cupcake can be cut into 4 or 5 pieces.

The question is: what is a procedure for coming up with a statistically meaningful ranking of the bakeries for each attribute, and for each flavor (treating "wildcard" as a flavor)? Specifically, we want to rank the bakeries 8 times: for each flavor we want to rank the bakeries by goodness (goodness being one of the attributes), and for each attribute we want to rank the bakeries across all flavors (ie, independent of flavor, ie, aggregating over all flavors). The grand prize goes to the top-ranked bakery for the goodness attribute.

Bonus points for generalizing this, of course.

This is happening in about 12 hours so I'll post as an answer what we ended up doing if no one answers in the meantime.

PS: Here's the post-party blog post about it: http://gracenotesnyc.com/2009/08/05/gracenotes-nycs-cupcake-cagematch-the-sweetest-battle-ever/

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

陌若浮生 2024-08-06 01:49:04

这就是我们最终所做的。 我在 http://etherpad.com/sugarorgy 上制作了一个巨大的表格来收集每个人的评分(修订版 25,仅以防我添加此公共链接而遭到破坏),然后使用以下 Perl 脚本将数据解析为 CSV 文件:

#!/usr/bin/env perl
# Grabs the cupcake data from etherpad and parses it into a CSV file.

use LWP::Simple qw(get);

$content = get("http://etherpad.com/ep/pad/export/sugarorgy/latest?format=txt");
$content =~ s/^.*BEGIN_MAGIC\s*//s;
$content =~ s/END_MAGIC.*$//s;
$bakery = "none";
for $line (split('\n', $content)) {
  next if $line =~ /sar kri and deb/;
  if ($line =~ s/bakery\s+(\w+)//) { $bakery = $1; }
  $line =~ s/\([^\)]*\)//g; # strip out stuff in parens.
  $line =~ s/^\s+(\w)(\w)/$1 $2/;
  $line =~ s/\-/\-1/g;
  $line =~ s/^\s+//;
  $line =~ s/\s+$//;
  $line =~ s/\s+/\,/g;
  print "$bakery,$line\n"; 
}

然后我在 Mathematica 中进行了平均等操作:

data = Import["!~/svn/sugar.pl", "CSV"];

(* return a bakery's list of ratings for the given type of cupcake *)
tratings[bak_, t_] := Select[Drop[First@Select[data, 
                        #[[1]]==bak && #[[2]]==t && #[[3]]=="g" &], 3], #!=-1&]

(* return a bakery's list of ratings for the given cupcake attribute *)
aratings[bak_, a_] := Select[Flatten[Drop[#,3]& /@ 
                        Select[data, #[[1]]==bak && #[[3]]==a&]], #!=-1&]

(* overall rating for a bakery *)
oratings[bak_] := Join @@ (tratings[bak, #] & /@ {"V", "C", "R", "W"})

bakeries = Union@data[[All, 1]]

SortBy[{#, oratings@#, Round[Mean@oratings[#], .01]}& /@ bakeries, -#[[3]]&]

结果位于 http://etherpad.com/sugarorgy

Here's what we ended up doing. I made a huge table to collect everyone's ratings at http://etherpad.com/sugarorgy (Revision 25, just in case it gets vandalized with me adding this public link to it) and then used the following Perl script to parse the data into a CSV file:

#!/usr/bin/env perl
# Grabs the cupcake data from etherpad and parses it into a CSV file.

use LWP::Simple qw(get);

$content = get("http://etherpad.com/ep/pad/export/sugarorgy/latest?format=txt");
$content =~ s/^.*BEGIN_MAGIC\s*//s;
$content =~ s/END_MAGIC.*$//s;
$bakery = "none";
for $line (split('\n', $content)) {
  next if $line =~ /sar kri and deb/;
  if ($line =~ s/bakery\s+(\w+)//) { $bakery = $1; }
  $line =~ s/\([^\)]*\)//g; # strip out stuff in parens.
  $line =~ s/^\s+(\w)(\w)/$1 $2/;
  $line =~ s/\-/\-1/g;
  $line =~ s/^\s+//;
  $line =~ s/\s+$//;
  $line =~ s/\s+/\,/g;
  print "$bakery,$line\n"; 
}

Then I did the averaging and whatnot in Mathematica:

data = Import["!~/svn/sugar.pl", "CSV"];

(* return a bakery's list of ratings for the given type of cupcake *)
tratings[bak_, t_] := Select[Drop[First@Select[data, 
                        #[[1]]==bak && #[[2]]==t && #[[3]]=="g" &], 3], #!=-1&]

(* return a bakery's list of ratings for the given cupcake attribute *)
aratings[bak_, a_] := Select[Flatten[Drop[#,3]& /@ 
                        Select[data, #[[1]]==bak && #[[3]]==a&]], #!=-1&]

(* overall rating for a bakery *)
oratings[bak_] := Join @@ (tratings[bak, #] & /@ {"V", "C", "R", "W"})

bakeries = Union@data[[All, 1]]

SortBy[{#, oratings@#, Round[Mean@oratings[#], .01]}& /@ bakeries, -#[[3]]&]

The results are at the bottom of http://etherpad.com/sugarorgy.

我不吻晚风 2024-08-06 01:49:04

也许阅读投票系统会有所帮助。 PS:不要将维基百科上写的任何内容视为“好鱼”。 我在那里发现了高级主题中的事实错误。

Perhaps reading about voting systems will be helpful. PS: don't take whatever is written on Wikipedia as "good fish". I have found factual errors in advanced topics there.

傲鸠 2024-08-06 01:49:04

将问题分解为子问题。

纸杯蛋糕的价值是多少? 基本方法是“分数的平均值”。 稍微更稳健的方法可能是“分数的加权平均值”。 但除此之外可能还有复杂的情况……具有 3 个优点和 3 个风味的纸杯蛋糕可能比具有 5 个风味和 1 个优点的纸杯蛋糕“更好”,即使风味和优点具有相同的权重(IOW,低分可能会产生不成比例的差异)。影响)。

制作一些样本纸杯蛋糕分数(具体细节!涵盖正常情况和一些奇怪的情况),并估计如果您有理想的算法,您认为合理的“总体”分数是什么。 然后,使用该数据对算法进行逆向工程。

例如,一个品质为 4、味道为 3、外观为 1、湿度为 4 的纸杯蛋糕可能总体得分为 4,而品质为 4、味道为 2、外观为 5、湿度为 4 的纸杯蛋糕可能只能得到 3 分。

接下来,做同样的事情对于面包店。 给定一组具有一系列分数的纸杯蛋糕,合适的评级是多少? 然后,找出将为您提供该数据的函数。

“好感”的排名似乎有点奇怪,好像是一个综合评分,加进去就已经是总分了,为什么还要算总分呢?

如果您有时间处理此问题,我总是建议捕获原始数据,并以此为基础进行更详细的分析,但我认为这在这里并不真正相关。

Break the problem up into sub-problems.

What's the value of a cupcake? A basic approach is "the average of the scores." A slightly more robust approach may be "the weighted average of the scores." But there may be complications beyond that... a cupcake with 3 goodness and 3 flavor may be 'better' than one with 5 flavor and 1 goodness, even if flavor and goodness have equal weight (IOW, a low score may have a disproportionate effect).

Make up some sample cupcake scores (specifics! Cover the normal scenarios and a couple weird ones), and estimate what you think a reasonable "overall" score would be if you had an ideal algorithm. Then, use that data to reverse engineer the algorithm.

For example, a cupcake with goodness 4, flavor 3, presentation 1 and moistness 4 might deserve a 4 overall, while one with goodness 4, flavor 2, presentation 5, and moistness 4 might only rate a 3.

Next, do the same thing for the bakery. Given a set of cupcakes with a range of scores, what would an appropriate rating be? Then, figure out the function that will give you that data.

The "goodness" ranking seems a bit odd, as it seems like it's a general rating, and so having it in there is already the overall score, so why calculate an overall score?

If you had time to work with this, I'd always suggest capturing the raw data, and using that as a basis to do more detailed analysis, but I don't think that's really relevant here.

策马西风 2024-08-06 01:49:04

也许这对您来说太笼统了,但是可以使用联合分析来解决此类问题 (链接文本)。 用于实现此功能的 AR 包是 bayesm(链接文本)。

Perhaps this is too general for you, but this type of problem can be approached using Conjoint Analysis (link text). A R package for implementing this is bayesm(link text).

那小子欠揍 2024-08-06 01:49:04

如果你会编写 SQL,你就可以创建一个小型数据库并编写一些查询。 应该没那么难。

例如,从按面包店、风味分组的表中选择总和(分数)/计数(分数)作为最终分数、面包店、风味

If you can write SQL, you could make a little database and write some queries. It should not be that difficult.

e.g. select sum(score) / count(score) as finalscore, bakery, flavour from tables where group by bakery, flavour

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文