需要灵感：选择大量数据以获得高分

发布于 2024-08-10 09:07:18 字数 863 浏览 6 评论 0原文

我需要一些解决方案的灵感...

我们正在运行一款拥有大约 80,000 名活跃用户的在线游戏 - 我们希望扩大这一规模，因此设定了实现多达 1-500,000 名用户的目标。

该游戏包括所有用户的高分，这是基于大量数据的。这些数据需要在代码中进行处理以计算每个用户的值。

计算出值后，我们需要对用户进行排名，并将数据写入高分表。

我的问题是，为了为 500.000 个用户生成高分，我们需要以 25-30.000.000 行的顺序从数据库加载数据，总计约 1.5-2GB 的原始数据。此外，为了对值进行排名，我们需要拥有总的值集。
此外，我们需要尽可能频繁地生成高分 - 最好每 30 分钟一次。

现在我们可以使用暴力 - 每 30 分钟加载 30 个 mio 记录，计算值并对它们进行排名，然后将它们写入数据库，但我担心这会对数据库、应用程序服务器造成压力和网络 - 如果可能的话。
我认为解决这个问题的办法可能是如何解决这个问题，但我不知道如何解决。因此，我正在根据以下信息寻找可能的替代解决方案的一些灵感：

我们需要所有约 500.000 个团队的完整高分 - 我们不能（除非绝对必要，否则不会）将其分片。
我假设如果没有所有用户值的列表，就无法对用户进行排名。
计算每个团队的价值必须在代码中完成 - 我们不能单独使用 SQL 来完成。
我们当前的方法单独加载每个用户的数据（3 次调用数据库）来计算值 - 加载数据并生成 25.000 个用户的高分需要大约 20 分钟，如果扩展到 500.000，则速度太慢。
我假设硬件大小不会成为问题（在合理的范围内）
我们已经在使用 memcached 来存储和检索缓存数据

欢迎任何建议、有关类似问题的好文章的链接。

原文

I need some inspiration for a solution...

We are running an online game with around 80.000 active users - we are hoping to expand this and are therefore setting a target of achieving up to 1-500.000 users.

The game includes a highscore for all the users, which is based on a large set of data. This data needs to be processed in code to calculate the values for each user.

After the values are calculated we need to rank the users, and write the data to a highscore table.

My problem is that in order to generate a highscore for 500.000 users we need to load data from the database in the order of 25-30.000.000 rows totalling around 1.5-2gb of raw data. Also, in order to rank the values we need to have the total set of values.
Also we need to generate the highscore as often as possible - preferably every 30 minutes.

Now we could just use brute force - load the 30 mio records every 30 minutes, calculate the values and rank them, and write them in to the database, but I'm worried about the strain this will cause on the database, the application server and the network - and if it's even possible.
I'm thinking the solution to this might be to break up the problem some how, but I can't see how. So I'm seeking for some inspiration on possible alternative solutions based on this information:

We need a complete highscore of all ~500.000 teams - we can't (won't unless absolutely necessary) shard it.
I'm assuming that there is no way to rank users without having a list of all users values.
Calculating the value for each team has to be done in code - we can't do it in SQL alone.
Our current method loads each user's data individually (3 calls to the database) to calculate the value - it takes around 20 minutes to load data and generate the highscore 25.000 users which is too slow if this should scale to 500.000.
I'm assuming that hardware size will not an issue (within reasonable limits)
We are already using memcached to store and retrieve cached data

Any suggestions, links to good articles about similar issues are welcome.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鼻尖触碰 2024-08-17 09:07:18

有趣的问题。根据我的经验，批处理只能作为最后的手段。通常最好让软件在使用新数据插入/更新数据库时计算值。对于您的场景，这意味着每次插入或更新用于计算团队得分的任何数据时都应该运行得分计算代码。将计算值与团队记录一起存储在数据库中。在计算值字段上放置索引。然后你可以要求数据库对该字段进行排序，速度会相对较快。即使有数百万条记录，它也应该能够在 O(n) 时间或更短的时间内返回前 n 条记录。我认为您根本不需要高分表，因为查询速度足够快（除非您对高分表除了作为缓存之外还有其他需要）。该解决方案还为您提供实时结果。

回复收藏 0 原文