慢更新与慢选择

发布于 2024-08-17 11:49:06 字数 1857 浏览 18 评论 0原文

这是一个关于权衡的问题。

想象一个社交网络。每个用户都有一个状态消息，他可以随时更改。每当他改变它时，他的所有朋友都会通过墙得到通知（就像在 Facebook 中一样）。

为了使这项工作成功。我们有 3 个表：Users(id, name)、FriendLists(userId、friendUserId)、Notifications(?)。

现在我们假设每个用户的朋友列表中有大约 50 个朋友。我面临着困境 - 如何实现通知表。

第一个选项

CREATE TABLE Notifications
(
toUserId bigint NOT NULL,
[identity] bigint IDENTITY(1,1) NOT NULL,
fromUserId bigint NOT NULL,
data varchar(256) NOT NULL,
CONSTRAINT [PK_Notifications] PRIMARY KEY CLUSTERED (toUserId, [identity])
)

发送通知：

-- Get all friends of @fromUserId.
WITH Friends AS
   (SELECT FriendLists.friendUserId
 FROM FriendLists
 WHERE userId = @fromUserId)
-- Send updates to all friends.
SELECT
 friendUserId as toUserId,
 @fromUserId as fromUserId,
 @data as data
INTO Notifications
FROM Friends

在这种情况下，对于每个状态更改，我们都会创建 50 条记录（假设有 50 个朋友）。这很糟糕。然而，好处是检索特定用户的通知非常快，因为我们在 toUserId 上有聚集索引。

第二个选项

CREATE TABLE Notifications
(
toUserId bigint NOT NULL,
[identity] bigint IDENTITY(1,1) NOT NULL,
fromUserId bigint NOT NULL,
data varchar(256) NOT NULL,
CONSTRAINT [PK_Notifications] PRIMARY KEY CLUSTERED ([identity])
)
CREATE NONCLUSTERED INDEX [IX_toUserId] ON Notifications (toUserId ASC)

发送通知：

-- Get all friends of @fromUserId.
WITH Friends AS
   (SELECT FriendLists.friendUserId
 FROM FriendLists
 WHERE userId = @fromUserId)
-- Send updates to all friends.
INSERT INTO Notifications(toUserId, fromUserId, data)
    VALUES(friendUserId, @fromUserId, @data)

这里我们只为每个状态更新插入一条记录。这很好。缺点是通知的检索会变慢，因为记录不是按 toUserId 聚集的。

两种方法的获取通知是相同的：

SELECT TOP(50) fromUserId, [identity], data
FROM Notifications
WHERE toUserId  = @toUserId

那么您对此有何看法？

原文

This is a question about tradeoffs.

Imagine a social network. Each user has a status message, that he can change anytime. Whenever he does change it, all his friends are notified through a wall (like in Facebook).

To make this work. We have 3 tables Users(id, name), FriendLists(userId, friendUserId), Notifications(?).

Now let's assume each user has approximately 50 friends in his friend list. I am faced with the dilemma - how to implement Notifications table.

1st option

CREATE TABLE Notifications
(
toUserId bigint NOT NULL,
[identity] bigint IDENTITY(1,1) NOT NULL,
fromUserId bigint NOT NULL,
data varchar(256) NOT NULL,
CONSTRAINT [PK_Notifications] PRIMARY KEY CLUSTERED (toUserId, [identity])
)

Send notifications:

-- Get all friends of @fromUserId.
WITH Friends AS
   (SELECT FriendLists.friendUserId
 FROM FriendLists
 WHERE userId = @fromUserId)
-- Send updates to all friends.
SELECT
 friendUserId as toUserId,
 @fromUserId as fromUserId,
 @data as data
INTO Notifications
FROM Friends

In this case, for each status change we create 50 records (assuming 50 friends). This is bad. However the good point is that to retrieve notifications for a specific user it is really fast, since we have a clustered index on the toUserId.

2nd option

CREATE TABLE Notifications
(
toUserId bigint NOT NULL,
[identity] bigint IDENTITY(1,1) NOT NULL,
fromUserId bigint NOT NULL,
data varchar(256) NOT NULL,
CONSTRAINT [PK_Notifications] PRIMARY KEY CLUSTERED ([identity])
)
CREATE NONCLUSTERED INDEX [IX_toUserId] ON Notifications (toUserId ASC)

Send notifications:

-- Get all friends of @fromUserId.
WITH Friends AS
   (SELECT FriendLists.friendUserId
 FROM FriendLists
 WHERE userId = @fromUserId)
-- Send updates to all friends.
INSERT INTO Notifications(toUserId, fromUserId, data)
    VALUES(friendUserId, @fromUserId, @data)

Here we only insert a single record per status update. This is good. The bad point is that the retrieval of the notifications is going to be slower, since records are not clustered by toUserId.

Getting notifications is same for both methods:

SELECT TOP(50) fromUserId, [identity], data
FROM Notifications
WHERE toUserId  = @toUserId

So what is your take on this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝礼 2024-08-24 11:49:06

首先，与写入相比，读取总是压倒性的，因为每堵“墙”被看到的次数比它被更新的次数要多得多。所以你最好加快阅读速度。

其次，这类大型社交网站固有的问题之一是数据的分布（分片、分区，没有一个数据库能够存储所有帐户、所有朋友、所有通知），这意味着当有新通知时如果被挂在墙上，则必须在其他服务器上通知好友。这意味着更新是异步的并且是基于消息传递的。

所以我肯定会选择一种针对阅读而优化的结构。

我建议您仔细阅读参与 Facebook 和 MySpace 等网站架构的各种人员所做的公开演示，例如这个是 Christa Stelzmuller 的。他们解释了设计中的许多思考和推理。

回复收藏 0 原文

趁年轻赶紧闹 2024-08-24 11:49:06

与 SELECT 相比，更新非常慢……慢了几个数量级。另外，随着站点的扩展，您将在内存中缓存所有提取，因此选择的速度将微不足道。

回复收藏 0 原文

寄离 2024-08-24 11:49:06

在这种情况下，在 (toUser,identity) 上创建聚集索引似乎是一个坏主意，因为聚集索引确实应该按升序插入。当然，SQL 会负责保持表排序，但这会带来很高的性能成本（这就是你问题的重点）。但一般来说，不建议提前知道不按特定顺序进行插入聚集索引。这是一个非常好的三个部分有关聚集索引建议的文章。

话虽如此，我还是坚持使用标识列作为聚集索引，并在 toUserId 上创建一个非聚集索引，也许还有一个日期时间列。通过包含日期时间列，您可以更有效地查询最近的数据。

关于更新缓慢，社交网站上的状态更新对于消息队列来说是一个完美的情况。这样，您就可以根据需要调整数据库以加快读取速度，并且如果它对写入性能产生影响，用户也不必遭受损失。从他们的角度来看，更新是即时的，尽管可能需要一些时间才能“坚持”。

对于非常大的数据库，我会听从 SQL 专家的意见，他们可以讨论分区策略（针对较新数据的更小、更易于管理的表，针对旧数据的更大/重度索引的表）和复制解决方案。

回复收藏 0 原文

~没有更多了~