如何推荐下一个成就
简短版本:
我有与 StackOverflow 类似的设置。 用户获得成就。 我的成就比 SO 多得多,可以说是 10k 左右,每个用户都有数百个成就。 现在,您将如何推荐(推荐)用户尝试的下一个成就?
长版本:
对象在 django 中建模如下(仅显示重要部分):
class User(models.Model):
alias = models.ForeignKey(Alias)
class Alias(models.Model):
achievements = models.ManyToManyField('Achievement', through='Achiever')
class Achievement(models.Model):
points = models.IntegerField()
class Achiever(models.Model):
achievement = models.ForeignKey(Achievement)
alias = models.ForeignKey(Alias)
count = models.IntegerField(default=1)
我的算法只是找到与登录用户具有共享成就的每个其他用户,然后遍历他们的所有成就并按数字排序出现次数:
def recommended(request) :
user = request.user.get_profile()
// The final response
r = {}
// Get all the achievements the user's aliases have received
// in a set so they aren't double counted
achievements = set()
for alias in user.alias_set.select_related('achievements').all() :
achievements.update(alias.achievements.all())
// Find all other aliases that have gotten at least one of the same
// same achievements as the user
otherAliases = set()
for ach in achievements :
otherAliases.update(ach.alias_set.all())
// Find other achievements the other users have gotten in addition to
// the shared ones.
// And count the number of times each achievement appears
for otherAlias in otherAliases :
for otherAch in otherAlias.achievements.all() :
r[otherAch] = r.get(otherAch, 0) + 1
// Remove all the achievements that the user has already gotten
for ach in achievements :
r.pop(ach)
// Sort by number of times the achievements have been received
r = sorted(r.items(), lambda x, y: cmp(x[1], y[1]), reverse=True)
// Put in the template for showing on the screen
template_values = {}
template_values['achievements'] = r
但是它需要永远运行,并且总是返回整个列表,这是不需要的。 用户只需要追求前几个成就即可。
因此,欢迎我提出有关其他算法和/或代码改进的建议。 我将在我的系统中为您提供推荐算法的成就:)
Short version:
I have a similar setup to StackOverflow. Users get Achievements. I have many more achievements than SO, lets say on the order of 10k, and each user has in the 100s of achievements. Now, how would you recommend (to recommend) the next achievement for a user to try for?
Long version:
The objects are modeled like this in django (showing only important parts) :
class User(models.Model):
alias = models.ForeignKey(Alias)
class Alias(models.Model):
achievements = models.ManyToManyField('Achievement', through='Achiever')
class Achievement(models.Model):
points = models.IntegerField()
class Achiever(models.Model):
achievement = models.ForeignKey(Achievement)
alias = models.ForeignKey(Alias)
count = models.IntegerField(default=1)
and my algorithm is just to find every other user that has a shared achievement with the logged in user, and then go through all their achievements and sort by number of occurrences :
def recommended(request) :
user = request.user.get_profile()
// The final response
r = {}
// Get all the achievements the user's aliases have received
// in a set so they aren't double counted
achievements = set()
for alias in user.alias_set.select_related('achievements').all() :
achievements.update(alias.achievements.all())
// Find all other aliases that have gotten at least one of the same
// same achievements as the user
otherAliases = set()
for ach in achievements :
otherAliases.update(ach.alias_set.all())
// Find other achievements the other users have gotten in addition to
// the shared ones.
// And count the number of times each achievement appears
for otherAlias in otherAliases :
for otherAch in otherAlias.achievements.all() :
r[otherAch] = r.get(otherAch, 0) + 1
// Remove all the achievements that the user has already gotten
for ach in achievements :
r.pop(ach)
// Sort by number of times the achievements have been received
r = sorted(r.items(), lambda x, y: cmp(x[1], y[1]), reverse=True)
// Put in the template for showing on the screen
template_values = {}
template_values['achievements'] = r
But it takes FOREVER to run, and always returns the whole list, which is unneeded. A user would only need the top few achievements to go after.
So, I'm welcome to recommendations on other algorithms and/or code improvements. I'll give you an achievement in my system for coming up with the recommendation algorithm :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以推荐哪些成就的一种方法是查看有多少用户已经拥有这些成就并推荐那些受欢迎的成就。 当他们实现了这些目标后,您可以沿着列表向下推荐不太受欢迎的目标。 然而,这有一个天真的假设,即每个人都想追求受欢迎的成就。 它可能会导致受欢迎的成就变得更受欢迎和不那么受欢迎,好吧……令人欣慰的是,这不会占用太多资源,并且可能运行得非常快。 (只需保留成就列表+实现的次数)
另一种方法(尝试根据用户已经拥有的成就来猜测用户可能会追求哪些成就)是使用一些机器学习算法。 我认为 k-最近邻算法在这里表现得很好。 选择一个阈值,然后输出高于该阈值的所有内容。 现在,我不知道这是否会比您已有的运行得更快,但是您应该在每次用户取得新成就时运行推荐引擎一次,存储前五个(比方说),然后输出它每当需要推荐时返回给用户。
我希望这有帮助。 =)
One method you can recommend which achievements to go for is to see how many of your users already have those achievements and recommend those popular ones. When they have achieved those you go down the list and recommend slightly less popular ones. However, this has a naive assumption that everyone wants to go for popular achievements. It might cause popular achievements to be even more popular and less popular ones, well... A consolation is that this doesn't take up much resources and is likely to run very fast. (Just keep a list of achievements + number of times it's achieved)
Another method (which attempts to guess which achievements the user is likely to go after based on what achievements he already had) is to use some machine learning algorithms. I think the k-nearest neighbor algorithm will perform quite well here. Select a threshold and just output everything that is above this threshold. Now, I don't know if this will run faster than what you already have, but you should just run the recommendation engine once every time the user has made a new achievement, store the top (let's say) five, and just output it back to the user whenever a recommendation is needed.
I hope this helps. =)
我建议您将前三个步骤(成就、其他别名、计数)作为单个 SQL 语句来执行。 现在,您正在 Python 中发出大量查询并汇总数千行,这是您应该委托给数据库的任务。 例如,代码
执行数千个巨大的查询。
相反,您可以使用 SQL 来完成此操作,根据别名 id 不同且成就 id 相同,将 Achiever 加入自身。 然后,您可以按成就 ID 进行分组并运行计数。
在下面的查询中,表“B”是其他用户的成就,“Achiever”是我们的成就。 如果任何其他用户共享一项成就,他们共享的每个成就都会在“B”中出现一次。 然后,我们按 alias_id 对它们进行分组,并计算它们出现的次数,这样您就可以得到一个不错的 id 计数表。
非常非常粗糙的代码(这里没有可用的 SQL)
如果按照我想象的方式工作,您将获得一个包含其他用户别名的表,以及他们与当前用户共享的成就数量。
您要做的下一件事是使用上面的 SQL 语句作为“内部选择”——将其称为“用户”。 您可以将其与当前用户的成就表和成就表连接起来。 您可能想要忽略除与当前用户相似的前 10 个用户之外的所有用户。
我现在没有时间编写一个好的查询,但是请查看您的数据库的 JOIN 语句,该语句在指定的 10 个用户和当前用户之间的 Achievement_id 上进行连接 - 如果该 id 不存在,则将该 id 设置为 NULL。 该过滤器仅筛选出现 NULL(未实现的成就)的行。
I would suggest that you do the first three steps (achievements, otherAliases, count) as one single SQL statement. As it is now, you are issuing a lot of queries and summarising thousands of rows in Python which is a task you should delegate to the DB. For example the code
Does thousands of huge queries.
Instead, you can use SQL to do this by joining Achiever on itself based on Alias id being different and achievement id being the same. You then group by achievement id and run a count.
In the query below, the table "B" is other user's achievements and "Achiever" is our achievements. If any other user shares an achievement, they appear once in "B" for each achievement they share. We then group those by alias_id and count the number of times they appeared so you get a nice id, count table out.
Very very rough code (no SQL available here)
If that works the way I think it will, you will get a table of other user aliases, along with the number of achievements they share with the current user.
The next thing you do is an SQL statement that uses the one above as an "inner select" - call it users. You join that with your achievements table and your Achiever table for the current user. You might want to ignore all but the top 10 users who are similar to the current user.
I don't have time to write up a good query right now, but look at the JOIN statement for your DB that joins on achievement_id between the nominated 10 users and the current user - setting that id to NULL if it doesn't exist. The filter only to rows where it turned up NULL (unachieved achievements).