通过多种来源标准化成果

发布于 2024-07-27 00:15:55 字数 1105 浏览 16 评论 0原文

我正在寻找好的算法推荐。

我有用户和成就。用户创建成就，然后将其提供给其他用户。与每个成就相关联的是用户指定的分值。用户的总积分是其所有成就的总和。

基本上：

Achievement :
    owner = Alias
    points = int

User :
    achievements = list(Achievement)
    def points() :
        sum([achievements.points])

好的，所以这个系统显然非常适合游戏。你们可以创建许多帐户并为彼此提供大量成就。我尝试通过将点值缩放为与用户指定的不同的值来稍微减少一点。

假设所有用户都是诚实的，但他们只是以不同的方式衡量困难。我应该如何标准化点值？ AKA 一个用户为每一项简单成就给出 5 分，另一个用户给出 10 分，我如何将它们标准化为一个值。目标是积分与难度成正比的分布。
如果一个用户不太擅长判断积分值，我如何根据获得该成就的用户数量来判断难度？
假设用户大部分可以分为不相交的组，其中一个用户为一整组其他用户提供成就。这对前两个算法有帮助吗？例如，用户A仅向以奇数结尾的用户提供成就，而用户B仅向以偶数结尾的用户提供成就。
如果每个人都是恶意的，我能有多接近不让用户能够过度夸大他们的点值？

注意：给予用户的质量与他获得的成就数量没有任何关系。许多给予者只是机器人，它们本身没有收到任何东西，但会自动奖励用户执行某些操作。

我目前的计划是这样的。我为每人分配了 10 分，该人从我这里获得了成就。如果我向总共 55 人发放了 10 个成就，那么我的分配是 550。然后根据获得该成就的人数分配给每个成就。如果分布为 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 人获得每项成就，则分值为 [50, 25、16.6、12.5、10、8.3、7.1、6.25、5.5、5]。

我的方法和替代建议如有任何问题，欢迎并赞赏。另外，请发布您认为我错过的其他案例，我会将它们添加到列表中。谢谢！

原文

I'm looking for a good algorithm recommendation.

I have Users and Achievements. Users create Achievements and then give them to other Users. Associated with each Achievement is the point value that the user specifies. A User's total points is the sum of all their achievements.

Basically:

Achievement :
    owner = Alias
    points = int

User :
    achievements = list(Achievement)
    def points() :
        sum([achievements.points])

Ok, so this system is obviously very game-able. You can make many accounts and give tons of achievements to eachother. I'm try to reduce that a little bit by scaling the point values to something different than what the user specified.

Assuming all users are honest, but they just gauge difficultly differently. How should I normalize the point values? AKA one user gives 5 points for every easy achievement, and another gives 10 points, how can I normalize them to one value. The goal would be a distribution where points are proportional to difficulty.
If one user isn't very good at judging point values, how can I figure out difficulty based on the number of users that have gotten the achievement?
Assume that Users could be mostly partitioned into disjoint groups with one User giving achievements to a whole set of other ones. Does that help the previous two algorithms? For example, User A only gives achievements to Users that end with an odd number and User B only gives achievements to Users that end with an even.
If everyone is malicious, how close can I get to not having user's being able to hyper-inflate their point values?

Note: The quality of a giving users is not in any way related to how many achievements he has received. Many givers are just bots that haven't received anything themselves but automatically reward users for doing certain actions.

My current plan is something like this. I have an allocation of 10 points / person that has got an achievement from me. If I have given out 10 achievements to 55 people total, my allocation is 550. Then this is given to each achievement based on the number of people who got it. If the distribution was [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] people who got each achievement, then the point values would be [50, 25, 16.6, 12.5, 10, 8.3, 7.1, 6.25, 5.5, 5].

Any problems with my approach and alternative recommendations are welcome and appreciated. Also, post other cases that you can think of that I've missed, and I'll add them to the list. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

杯别 2024-08-03 00:15:55

我认为在你的系统中，就像在 stackoverflow、digg、slashdot 等中一样，你的基本目标是

识别诚实用户，
推广他们的行为。

一般来说，我们通过他们的行为来识别诚实用户：那些在网站上存在很长时间的帐户，并且已经过其他用户和您的审查。 Stack Overflow 使用声誉分数，slashdot 使用业力点。

一旦您识别出这些诚实的用户，您就可以根据声誉得分按比例计算他们的选票：用户越诚实，我们就越信任他的成就。

因此，您可以为新帐户提供 10 的初始分数。然后，该用户可以提供他想要的任意数量的成就，但其实际总价值将为 10（如您建议的比例分配）。也就是说，如果一个新用户给出 100 个成就（所有成就都具有相同的分数），那么每个成就将获得 0.1 分，因为他的分数是 10。然后，当该用户从其他用户那里获得成就时，他的分数就会增加。

基本上，我建议您使用 pagerank，但不是对网页进行排名，而是对用户进行排名链接不是超链接，而是该用户向其他人提供的成就。

这是解决这个问题的一种方法。还有很多其他的。这取决于您的具体需求。拍卖总是很有趣。您可以让每个人在实际实现成就之前对其进行竞价，以便确定社区对该成就的价格（分数）。您需要限制人们拥有的“金钱”数量。

回复收藏 0 原文

不甘平庸 2024-08-03 00:15:55

我一直在自己的网站上遇到此类问题。如果您有大量现有数据可以用作基线，那么分数标准化似乎非常有效。首先获取用户创建的成就的平均值和标准差：

SELECT AVG(Points) AS user_average, 
STDDEV_POP(Points) AS user_stddev
FROM Achievements WHERE Owner = X

使用这些值计算上下文无关的“z 分数”：

$zscore = ($rating - $user_average) / $user_stddev;

获取所有成就的平均值和标准差：

SELECT AVG(Points) AS all_average, 
STDDEV_POP(Points) AS all_stddev 
FROM Achievements

使用这些值来计算创建标准化的“t 分数”：

$tscore = $all_average + ($all_stddev * $zscore);

然后使用 t 分数作为成就价值的内部表示。 YMMV。 :)

I've been struggling with this type of problem on my own site. If you have a large corpus of existing data you can use as a baseline, score normalization seems pretty effective. First get the mean value and standard deviation for the user's created achievements:

SELECT AVG(Points) AS user_average, 
STDDEV_POP(Points) AS user_stddev
FROM Achievements WHERE Owner = X

Use these values to calculate a context-free "z-score":

$zscore = ($rating - $user_average) / $user_stddev;

Get the mean and standard deviation for all achievments:

SELECT AVG(Points) AS all_average, 
STDDEV_POP(Points) AS all_stddev 
FROM Achievements

Use these values to create a normalized "t-score":

$tscore = $all_average + ($all_stddev * $zscore);

Then use the t-score as your internal representation of an achievement's value. YMMV. :)

回复收藏 0 原文

五里雾 2024-08-03 00:15:55

正确，$ rating 是输入，$tscore 是标准化输出。

理想情况下，每个人都会按照相同的标准为自己的成就打分。愚蠢或微不足道的成就得 1 分，普通成就得 10 分，真正史诗般的成就得 50 分，等等。但在分配分数时，人们的行为却截然不同。有些人会非常慷慨，让每一项成就都物超所值。其他人会严格而准确，仔细遵守与成就难度相关的比例。其他人可能认为人们担心分数并为他们创造的所有成就分配最低值是愚蠢的。

标准化试图处理这些个体异常并使每个人的评级符合相同的标准。这就像奥运会上裁判打分的做法一样。您不会“盲目地信任”用户分配给成就的价值，但如果它是系统的一部分，您就需要考虑它。否则，你大概可以对成就的点值进行硬编码，限制创建它们的频率，听起来这会遏制最严重的滥用。但分数很有用，因为在标准化之后，您可以计算出如果该成就是由典型的普通用户创建的，那么它的价值是多少。这使得人们很难“玩弄”系统，因为他们距离平均值和成就分布越远，他们自己的价值观就越会回归到基线。

我应该提到，我不是受过专业训练的程序员，而且我从未上过统计课或任何高等数学课程。由于我自己的理解有限，也许我不是解释这一点的最佳人选。但我一直在自己的网站上遇到类似的问题（用户对用户的评分），在尝试了多种方法之后，这个方法似乎是最有希望的。实施的大部分灵感来自 http://www.ericdigests.org /2003-4/score-normilization.html 所以您可能也想阅读该内容。

Correct, $rating is input and $tscore is the normalized output.

Ideally, everyone would assign points for their achievements on an identical scale. One point for stupid or trivial achievements, ten points for modest achievements, 50 points for truly epic achievements, or whatever. But people have very different behavior when it comes to assigning scores. Some will be very generous, and make every achievement worth the max. Others will be strict and accurate, adhering carefully to the scale as it relates to the difficulty of the achievement. Others may think it's dumb that people worry about points, and assign the minimum value for all the achievements they create.

Normalization attempts to handle these individual abnormalities and fit everyone's ratings to the same scale. It's like what they do with the judges' scores in the Olympics. You don't "blindly trust" the value a user assigned to an achievement, but it's something you want to account for if it's part of the system. Otherwise you could presumably just hard-code the point value of achievements, limit how often they can be created, and it sounds like that would curb the worst abuse. But the score is useful because, after normalization, you can figure out what the achievement's value would be worth if it was created by a stereotypically average user. That makes it difficult for people to "game" the system because the further they get from the average value and distribution for achievements, the more their own values get normalized back towards the baseline.

I should mention that I am not a professionally trained programmer, and I have never taken a statistics class or any higher math. Due to my own limitations of understanding, perhaps I'm not the best person to be explaining this. But I have been struggling with a similar problem on my own site (user-to-user ratings), and after trying numerous approaches this one seems like the most promising. Most of the inspiration for the implementation came from http://www.ericdigests.org/2003-4/score-normilization.html so you might like to read that as well.

回复收藏 0 原文

~没有更多了~