可扩展的 MySQL 数据库,用于类似邮件的消息传递

发布于 2024-08-21 08:39:48 字数 1109 浏览 4 评论 0原文

假设我们有一个受欢迎的网站。我们需要在用户之间实现类似邮件的消息传递。 典型的解决方案是使用 2 个表:

Users (user_id)

Messages (message_id, sender_id (references user_id), receive_id (references user_id), subject, body )。

这种方法有两个明显的限制:

  1. 所有用户的所有消息都存储在一个表中,导致其负载很高并降低数据库的整体性能。
  2. 当某人需要同时向多个用户发送消息时,消息会被复制 (recipients_count) 次。

另一个解决方案使用 3 个表:

Users(user_id)

Sent_messages(sent_id, sender_id (references user_id), subject, body)

Received_messages(sent_id, receive_messages的receiver_id(引用user_id)、主题、正文)

主题和正文是从sent_messages的相应字段复制的。

这种方法通过将信息从一个表复制到另一个表来导致

  1. 数据库非规范化。
  2. 用户实际上可以删除发送/接收的消息,而无需将它们从接收者/发送者中删除。
  3. 消息占用大约 2 倍的空间
  4. 每个表的加载量大约减少 2 倍。

那么问题来了:

  1. 哪种设计更适合高负载和可扩展性? (我认为是第二个)
  2. 是否有另一种数据库设计可以处理高负载?它是什么?有什么限制?

谢谢!

PS 我知道在解决这些可扩展性问题之前,网站必须非常成功,但我想知道如果需要的话该怎么做。

更新

目前,对于第一个版本,我将使用 Daniel Vassallo 提出的设计。但如果以后一切顺利的话,设计就会改成第二种。感谢埃弗特减轻了我对此的担忧。

Assume we have a popular site. We need to implement mail-like messaging between users.
Typical solution is to use 2 tables:

Users (user_id)

Messages (message_id, sender_id (references user_id), receiver_id (references user_id), subject, body ).

This method has 2 significant limitations

  1. All messages of all users are stored in one table leading to it's high load and decreasing overall database performance.
  2. When someone needs to send message to several users simultaneously, the message gets copied (recipients_count) times.

The other solution uses 3 tables:

Users(user_id)

Sent_messages(sent_id, sender_id (references user_id), subject, body)

Received_messages(sent_id, receiver_id (references user_id), subject, body)

subject and body of received_messages are copied from corresponding fields of sent_messages.

This method leads to

  1. Denormalizing the database by copying information from one table to another
  2. Users can actually delete sent/received messages without removing them from the receivers/senders.
  3. Messages take approximately 2 times more space
  4. Each table is loaded approximately 2 times less.

So here go the questions:

  1. Which one of considered design is better for high load and scalability? (I think it's the second one)
  2. Is there another database design that can handle high load? What is it? What are the limitations?

Thanks!

P.S. I understand that before getting to these scalability issues the site has to be very successful, but I want to know what to do if I need to.

UPDATE

Currently for the first versions I'll be using design proposed by Daniel Vassallo. But if everything is OK in the future, the design will be changed to the second one. Thanks to Evert for allaying my apprehension about it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

讽刺将军 2024-08-28 08:39:48

在将邮件发送给多个收件人的情况下,您可能希望避免多次复制邮件正文。这是您可能需要考虑的另一个选择:

  • 用户 (user_id)

  • 消息(message_id、sender_id、主题、
    身体)

  • received_messages(message_id、user_id、address_mode、已删除)

这个模型可能更twitter-与电子邮件类似,但它可能具有一些优点。

规则是:

  • 一条消息只能由一个用户发送,在每条消息的 sender_id 中引用。
  • 每个收件人都将在received_messages 表中定义。 address_mode 字段可以定义消息是直接发送给接收者,还是作为 CC,或者可能作为 BCC。该字段显然是可选的。
  • 收件人删除的邮件将在received_messages 表中标记已删除标志。
  • 转发和回复的消息需要使用新的 sender_id 重新创建。然后可以修改消息正文。

以下是一些优点:

  • 这比原始问题中提到的两个选项占用的空间更少,特别是当用户通常将消息发送给多个收件人时。
  • 由于消息永远不会重复,因此更容易缓存消息表。
  • 删除邮件的收件人不会删除该邮件发送给该用户的信息。它将在 receive_messages 表中简单地标记为“已删除”。
  • 而且您还会得到一个标准化模型。

对于大多数应用程序,如果您对上述模型使用乐观隔离级别,即使您期望以每秒几个的速率交换消息,也不应该出现性能问题。另一方面,如果您预计每秒有数百或数千条消息,那么可能确实需要考虑其他选项。

You may want to avoid copying the message body multiple times in the case where a message is sent to multiple recipients. Here is another option which you may want to consider:

  • users (user_id)

  • messages (message_id, sender_id, subject,
    body)

  • received_messages (message_id, user_id, address_mode, deleted)

This model may be more twitter-like than email-like, but it may come with some advantages.

The rules are that:

  • A message can only be sent by one user, referenced in the sender_id of each message.
  • Each recipient will be defined in the received_messages table. The address_mode field can define whether a message was sent to the recipient directly, or as a CC, or maybe as BCC. This field is obviously optional.
  • Deleted messages by recipients will mark the deleted flag in the received_messages table.
  • Forwarded and replied-to messages need to be recreated with a new sender_id. The message body can be then modified.

These are some of the advantages:

  • This takes less space than the two options mentioned in the original question, especially if users will typically send messages to multiple recipients.
  • Easier caching of the messages table, since messages are never duplicated.
  • The recipient deleting a message will not erase the information that the message was sent to this user. It will simply be marked as 'deleted' in the received_messages table.
  • And you also get a normalized model.

For most applications, if you use an optimistic isolation level with the above model, you should not have performance problems even if you are expecting messages to be exchanged at a rate of a few per second. If on the other hand you're expecting hundreds or thousands of messages per second, then it may really be the case to consider other options.

宛菡 2024-08-28 08:39:48

一般来说,数据库大小不会引起太多关注。速度更为重要。

因此,我很想选择第二个方案。就像您提到的那样,它使删除消息之类的事情变得更加容易,而且我很确定这是一种非常常见的方法。

In general database size will not be much of a concern. Speed is much more important.

Therefore, I would be tempted to go for option two. Just like you mentioned, it makes things like deleting messages a lot easier, and I'm pretty sure this is a very common way to do this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文