评论帖子可扩展性:每个用户排名前 n,1 次更新,大量阅读
情况是这样的。 数百万用户的网站。 每个用户的页面都有一个消息部分。 任何人都可以访问用户的页面,在那里他们可以留言或查看最后 100 条消息。
消息是带有一些额外元数据的短文本片段。 每条消息都必须永久存储,唯一必须实时快速的是消息更新和阅读(人们将其用作聊天)。 将经常读取消息计数以检查更改。 可以定期归档旧消息(> 100 条),但它们必须可访问。
目前,所有内容都在一个大数据库表中,读取消息列表和发送更多更新的人们之间的争用正在成为一个问题。
如果您必须重新架构系统,您会使用什么存储机制/缓存? 这里可以使用什么样的计算机科学学习? (例如集合、列表访问等)
Here's the situation. Multi-million user website. Each user's page has a message section. Anyone can visit a user's page, where they can leave a message or view the last 100 messages.
Messages are short pieces of txt with some extra meta-data. Every message has to be stored permanently, the only thing that must be real-time quick is the message updates and reading (people use it as chat). A count of messages will be read very often to check for changes. Periodically, it's ok to archive off the old messages (those > 100), but they must be accessible.
Currently all in one big DB table, and contention between people reading the messages lists and sending more updates is becoming an issue.
If you had to re-architect the system, what storage mechanism / caching would you use? what kind of computer science learning can be used here? (eg collections, list access etc)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一些一般性想法,不特定于任何特定技术:
按用户 ID 划分数据。 这个想法是,您可以将用户空间统一划分为大小大致相同的不同分区。 您可以使用适当的哈希函数来跨分区划分用户。 最终,每个分区都属于一台单独的机器。 然而,即使在同一台机器上的不同表/数据库上,这也将消除一些争用。 分区限制了争用,并为未来“线性”扩展打开了大门。 这也有助于负载分配和横向扩展。
当选择一个哈希函数来对记录进行分区时,寻找一个能够最大限度地减少添加/删除分区时必须移动的记录数量的函数。
与许多其他应用程序一样,我们可以假设该服务的使用遵循幂律曲线:少数用户页面导致大量流量,然后是长尾。 缓存方案可以利用这一点。 曲线越陡,缓存就越有效。 考虑到短消息,如果每个页面显示 100 条消息,并且每条消息平均为 100 字节,那么 1GB RAM 缓存中可以容纳大约 100,000 个首页。 这些缓存的页面可以延迟写入数据库。 在 1000 万用户中,有 100,000 个用户有望发挥作用。
可能使用相同的哈希方案对 Web 服务器进行分区。 这使您可以保存单独的 RAM 缓存而不会发生争用。 潜在的好处是随着用户数量的增长而增加缓存大小。
如果适合您的环境,确保新消息最终写入数据库的一种方法是在将它们放入 RAM 缓存后立即将它们放入持久消息队列中。 队列不会出现争用,有助于确保消息在机器故障时不会丢失。
Some general thoughts, not particular to any specific technology:
Partition the data by user ID. The idea is that you can uniformly divide the user space to distinct partitions of roughly the same size. You can use an appropriate hashing function to divide users across partitions. Ultimately, each partition belongs on a separate machine. However, even on different tables/databases on the same machine this will eliminate some of the contention. Partitioning limits contention, and opens the door to scaling "linearly" in the future. This helps with load distribution and scale-out too.
When picking a hashing function to partition the records, look for one that minimizes the number of records that will have to be moved should partitions be added/removed.
Like many other applications, we could assume the use of the service follows a power law curve: few of the user pages cause much of the traffic, followed by a long tail. A caching scheme can take advantage of that. The steeper the curve, the more effective caching will be. Given the short messages, if each page shows 100 messages, and each message is 100 bytes on average, you could fit about 100,000 top-pages in 1GB of RAM cache. Those cached pages could be written lazily to the database. Out of 10 Mil users, 100,000 is in the ballpark for making a difference.
Partition the web servers, possibly using the same hashing scheme. This lets you hold separate RAM caches without contention. The potential benefit is increasing the cache size as the number of users grows.
If appropriate for your environment, one approach for ensuring new messages are eventually written to the database is to place them in a persistent message queue, right after placing them in the RAM cache. The queue suffers no contention, and helps ensure messages are not lost upon machine failure.
一个简单的解决方案可能是对数据进行非规范化,并将预先计算的聚合存储在单独的表中,例如 MESSAGE_COUNTS 表,其中有一列用于用户 ID,一列用于其消息计数。 当主消息表更新时,则重新计算聚合。
它只是将瓶颈从一个地方转移到另一个地方,但可能会将其转移到负担较轻的地方。
One simple solution could be to denormalize your data, and store pre-calculated aggregates in a separate table, e.g. a MESSAGE_COUNTS table which has a column for the user ID and a column for their message count. When the main messages table is updated, then re-calculate the aggregate.
It's just shifting the bottleneck from one place to another, but it might move it somewhere that's less of a burden.