Facebook数据库设计?

发布于 2024-07-24 02:57:13 字数 335 浏览 7 评论 0原文

我一直想知道Facebook是如何设计好友<->的? 用户关系。

我认为用户表是这样的:

user_email PK
user_id PK
password 

我计算出包含用户数据的表(我认为通过用户电子邮件连接的性别、年龄等)。

它如何将所有朋友连接到该用户?

像这样的东西吗?

user_id
friend_id_1
friend_id_2
friend_id_3
friend_id_N 

可能不会。 因为用户数量未知并且会扩大。

I have always wondered how Facebook designed the friend <-> user relation.

I figure the user table is something like this:

user_email PK
user_id PK
password 

I figure the table with user's data (sex, age etc connected via user email I would assume).

How does it connect all the friends to this user?

Something like this?

user_id
friend_id_1
friend_id_2
friend_id_3
friend_id_N 

Probably not. Because the number of users is unknown and will expand.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

ゝ杯具 2024-07-31 02:57:13

保留一个好友表,其中保存了 UserID,然后保存了好友的 UserID(我们将其称为 FriendID)。 这两列都将是返回 Users 表的外键。

有点有用的示例:

Table Name: User
Columns:
    UserID PK
    EmailAddress
    Password
    Gender
    DOB
    Location

TableName: Friends
Columns:
    UserID PK FK
    FriendID PK FK
    (This table features a composite primary key made up of the two foreign 
     keys, both pointing back to the user table. One ID will point to the
     logged in user, the other ID will point to the individual friend
     of that user)

示例用法:

Table User
--------------
UserID EmailAddress Password Gender DOB      Location
------------------------------------------------------
1      [email protected]  bobbie   M      1/1/2009 New York City
2      [email protected]  jonathan M      2/2/2008 Los Angeles
3      [email protected]  joseph   M      1/2/2007 Pittsburgh

Table Friends
---------------
UserID FriendID
----------------
1      2
1      3
2      3

这将表明 Bob 是 Jon 和 Joe 的朋友,并且 Jon 也是 Joe 的朋友。 在此示例中,我们假设友谊始终是两种方式,因此您不需要表中的行(例如 (2,1) 或 (3,2)),因为它们已经在另一个方向上表示。 例如,友谊或其他关系不是明确的双向关系,您还需要使用这些行来指示双向关系。

Keep a friend table that holds the UserID and then the UserID of the friend (we will call it FriendID). Both columns would be foreign keys back to the Users table.

Somewhat useful example:

Table Name: User
Columns:
    UserID PK
    EmailAddress
    Password
    Gender
    DOB
    Location

TableName: Friends
Columns:
    UserID PK FK
    FriendID PK FK
    (This table features a composite primary key made up of the two foreign 
     keys, both pointing back to the user table. One ID will point to the
     logged in user, the other ID will point to the individual friend
     of that user)

Example Usage:

Table User
--------------
UserID EmailAddress Password Gender DOB      Location
------------------------------------------------------
1      [email protected]  bobbie   M      1/1/2009 New York City
2      [email protected]  jonathan M      2/2/2008 Los Angeles
3      [email protected]  joseph   M      1/2/2007 Pittsburgh

Table Friends
---------------
UserID FriendID
----------------
1      2
1      3
2      3

This will show that Bob is friends with both Jon and Joe and that Jon is also friends with Joe. In this example we will assume that friendship is always two ways, so you would not need a row in the table such as (2,1) or (3,2) because they are already represented in the other direction. For examples where friendship or other relations aren't explicitly two way, you would need to also have those rows to indicate the two-way relationship.

萌酱 2024-07-31 02:57:13

TL;DR:

他们使用堆栈架构,其中缓存图表用于其堆栈 MySQL 底部之上的所有内容。

长答案:

我自己对此做了一些研究,因为我很好奇他们如何处理大量数据并快速搜索它。 我见过有人抱怨定制的社交网络脚本随着用户群的增长而变得缓慢。 在我用仅 10k 用户和250 万朋友连接对自己进行了一些基准测试之后 - 甚至没有尝试去关心群组权限、点赞和墙贴 - 很快就发现,这方法是有缺陷的。 因此,我花了一些时间在网上搜索如何做得更好,并发现了这篇 Facebook 官方文章:

真的建议您在继续阅读之前观看上面第一个链接的演示。 这可能是您能找到的关于 FB 幕后工作原理的最佳解释。

该视频和文章告诉您一些事情:

  • 他们在堆栈的最底部使用 MySQL
  • 在 SQL DB 之上有 TAO 层,其中至少包含两个缓存级别并使用图形来描述连接。
  • 我找不到任何关于他们实际用于缓存图表的软件/数据库的信息

让我们看一下,朋友连接位于左上角:

在此处输入图像描述

嗯,这是一张图表。 :) 它没有告诉您如何在 SQL 中构建它,有多种方法可以做到这一点,但是 这个网站有很多不同的方法。 注意:考虑一下关系数据库的本质:它被认为存储规范化数据,而不是图形结构。 因此它的性能不如专门的图形数据库。

还要考虑到您必须执行比朋友的朋友更复杂的查询,例如,当您想要过滤您和您朋友的朋友喜欢的给定坐标周围的所有位置时。 图表是这里的完美解决方案。

我无法告诉您如何构建它以使其性能良好,但它显然需要一些尝试、错误和基准测试。

这是我令人失望的测试,只是发现朋友的朋友:

数据库架构:

CREATE TABLE IF NOT EXISTS `friends` (
`id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `friend_id` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

朋友的朋友查询:

(
        select friend_id
        from friends
        where user_id = 1
    ) union (
        select distinct ff.friend_id
        from
            friends f
            join friends ff on ff.user_id = f.friend_id
        where f.user_id = 1
    )

我真的建议您创建一些包含至少 10k 用户记录的示例数据并且每个人都至少有 250 个好友连接,然后运行此查询。 在我的机器(i7 4770k、SSD、16GB RAM)上,该查询的结果是~0.18 秒。 也许它可以优化,我不是数据库天才(欢迎提出建议)。 然而,如果是线性扩展,那么对于 10 万用户来说,时间已经是 1.8 秒,对于 100 万用户来说,时间已经是 18 秒。

对于大约 10 万用户来说,这听起来可能还不错,但考虑到您只是获取了朋友的朋友,并且没有执行任何更复杂的查询,例如“仅向我显示来自朋友的朋友的帖子 + 如果允许,请执行权限检查”或者不允许看到其中一些+执行子查询来检查我是否喜欢其中任何一个”。 您想让数据库检查您是否已经喜欢某个帖子,否则您必须在代码中进行检查。 还要考虑到这不是您运行的唯一查询,并且您在或多或少受欢迎的网站上同时拥有多个活跃用户。

我认为我的回答很好地回答了 Facebook 如何很好地设计好友关系的问题,但很抱歉我无法告诉您如何以快速运行的方式实现它。 实施社交网络很容易,但确保其良好运行显然并不容易 - 恕我直言。

我已经开始尝试使用 OrientDB 进行图形查询并将边缘映射到底层 SQL DB。 如果我完成了它,我会写一篇关于它的文章。

如何创建一个性能良好的社交网站?

更新 2021 年 4 月 10 日:我可能永远不会写这篇文章 ;) 但这里有一些要点,您可以尝试扩展它:

  • 使用不同的读取和写入存储库
  • 构建特定的读取存储库基于为此目的而设计的更快的非关系数据库系统,不要害怕数据的非规范化。 写入规范化数据库,但从专门视图读取。
  • 使用最终一致性
  • 看看 CQRS
  • 对于基于社交网络图的读取存储库可能也是个好主意。
  • 使用 Redis 作为读取存储库,在其中存储整个序列化数据集

如果您以智能方式组合上面列表中的要点,您可以构建一个非常性能良好的系统。 该列表不是“待办事项”列表,您仍然需要理解、思考和熟练它! https://microservices.io/ 是一个不错的网站,涵盖了我之前提到的一些主题。

我所做的是存储聚合生成的事件,并使用项目和处理程序写入不同的数据库,如上所述。 最酷的一点是,我可以随时根据需要重新构建数据。

TL;DR:

They use a stack architecture with cached graphs for everything above the MySQL bottom of their stack.

Long Answer:

I did some research on this myself because I was curious how they handle their huge amount of data and search it in a quick way. I've seen people complaining about custom made social network scripts becoming slow when the user base grows. After I did some benchmarking myself with just 10k users and 2.5 million friend connections - not even trying to bother about group permissions and likes and wall posts - it quickly turned out that this approach is flawed. So I've spent some time searching the web on how to do it better and came across this official Facebook article:

I really recommend you to watch the presentation of the first link above before continue reading. It's probably the best explanation of how FB works behind the scenes you can find.

The video and article tells you a few things:

  • They're using MySQL at the very bottom of their stack
  • Above the SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections.
  • I could not find anything on what software / DB they actually use for their cached graphs

Let's take a look at this, friend connections are top left:

enter image description here

Well, this is a graph. :) It doesn't tell you how to build it in SQL, there are several ways to do it but this site has a good amount of different approaches. Attention: Consider that a relational DB is what it is: It's thought to store normalised data, not a graph structure. So it won't perform as good as a specialised graph database.

Also consider that you have to do more complex queries than just friends of friends, for example when you want to filter all locations around a given coordinate that you and your friends of friends like. A graph is the perfect solution here.

I can't tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking.

Here is my disappointing test for just findings friends of friends:

DB Schema:

CREATE TABLE IF NOT EXISTS `friends` (
`id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `friend_id` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

Friends of Friends Query:

(
        select friend_id
        from friends
        where user_id = 1
    ) union (
        select distinct ff.friend_id
        from
            friends f
            join friends ff on ff.user_id = f.friend_id
        where f.user_id = 1
    )

I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query. On my machine (i7 4770k, SSD, 16gb RAM) the result was ~0.18 seconds for that query. Maybe it can be optimized, I'm not a DB genius (suggestions are welcome). However, if this scales linear you're already at 1.8 seconds for just 100k users, 18 seconds for 1 million users.

This might still sound OKish for ~100k users but consider that you just fetched friends of friends and didn't do any more complex query like "display me only posts from friends of friends + do the permission check if I'm allowed or NOT allowed to see some of them + do a sub query to check if I liked any of them". You want to let the DB do the check on if you liked a post already or not or you'll have to do in code. Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site.

I think my answer answers the question how Facebook designed their friends relationship very well but I'm sorry that I can't tell you how to implement it in a way it will work fast. Implementing a social network is easy but making sure it performs well is clearly not - IMHO.

I've started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB. If I ever get it done I'll write an article about it.

How can I create a well performing social network site?

Update 2021-04-10: I'll probably never ever write the article ;) but here are a few bullet points how you could try to scale it:

  • Use different read and write repositories
  • Build specific read repositories based on faster non-relational DB systems made for that purpose, don't be afraid of denormalizing data. Write to a normalized DB but read from specialized views.
  • Use eventual consistence
  • Take a look at CQRS
  • For a social network graphs based read repositories might be also good idea.
  • Use Redis as a read repository in which you store whole serialized data sets

If you combine the points from the above list in a smart way you can build a very well performing system. The list is not a "todo" list, you'll still have to understand, think and adept it! https://microservices.io/ is a nice site that covers a few of the topics I mentioned before.

What I do is to store events that are generated by aggregates and use projects and handlers to write to different DBs as mentioned above. The cool thing about this is, I can re-build my data as needed at any time.

西瓜 2024-07-31 02:57:13

查看以下数据库架构,Anatoly Lubarsky 进行逆向工程

Facebook Schema

Have a look at the following database schema, reverse engineered by Anatoly Lubarsky:

Facebook Schema

倾听心声的旋律 2024-07-31 02:57:13

我最好的选择是他们创建了一个图形结构。 节点是用户,“友谊”是边。

保留一张用户表,保留另一张边缘表。 然后,您可以保留有关边缘的数据,例如“他们成为朋友的日期”和“批准的状态”等。

My best bet is that they created a graph structure. The nodes are users and "friendships" are edges.

Keep one table of users, keep another table of edges. Then you can keep data about the edges, like "day they became friends" and "approved status," etc.

夏の忆 2024-07-31 02:57:13

它很可能是多对多关系:

FriendList(表)

user_id -> users.user_id
friend_id -> users.user_id
friendVisibilityLevel

编辑

用户表可能没有 user_email 作为 PK,可能 作为唯一键。

用户(表)

user_id PK
user_email
password

It's most likely a many to many relationship:

FriendList (table)

user_id -> users.user_id
friend_id -> users.user_id
friendVisibilityLevel

EDIT

The user table probably doesn't have user_email as a PK, possibly as a unique key though.

users (table)

user_id PK
user_email
password
人疚 2024-07-31 02:57:13

看一下这些描述 LinkedIn 和 Digg 如何构建的文章:

还有“大数据:Facebook 数据团队的观点”可能会有所帮助:

http://developer.yahoo.net/blogs/theater/archives/2008/01/nextyahoonet_big_data_viewpoints_from_the_fac.html

另外,有一篇文章讨论非关系数据库以及一些公司如何使用它们:

http:// /www.readwriteweb.com/archives/is_the_relational_database_doomed.php

您会发现这些公司正在处理数据仓库、分区数据库、数据缓存和其他我们大多数人日常从未处理过的更高级别的概念。 或者至少,也许我们不知道我们这样做。

前两篇文章中有很多链接,应该可以让您有更多的了解。

更新 10/20/2014

Murat Demirbas 在 TAO 上撰写了摘要

  • :Facebook 的分布式数据存储社交图 (ATC'13)
  • F4:Facebook 的温暖 BLOB 存储系统 (OSDI'14)

http://muratbuffalo.blogspot.com/2014/10/facebooks-software-architecture.html

HTH

Take a look at these articles describing how LinkedIn and Digg are built:

There's also "Big Data: Viewpoints from the Facebook Data Team" that might be helpful:

http://developer.yahoo.net/blogs/theater/archives/2008/01/nextyahoonet_big_data_viewpoints_from_the_fac.html

Also, there's this article that talks about non-relational databases and how they're used by some companies:

http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php

You'll see that these companies are dealing with data warehouses, partitioned databases, data caching and other higher level concepts than most of us never deal with on a daily basis. Or at least, maybe we don't know that we do.

There are a lot of links on the first two articles that should give you some more insight.

UPDATE 10/20/2014

Murat Demirbas wrote a summary on

  • TAO: Facebook's distributed data store for the social graph (ATC'13)
  • F4: Facebook's warm BLOB storage system (OSDI'14)

http://muratbuffalo.blogspot.com/2014/10/facebooks-software-architecture.html

HTH

思念绕指尖 2024-07-31 02:57:13

对于恒定时间超过 5 亿的数据,无法从 RDBMS 检索用户好友数据的数据
因此 Facebook 使用哈希数据库(无 SQL)实现了这一点,并且他们开源了名为 Cassandra 的数据库。

因此每个用户都有自己的密钥和队列中的好友详细信息; 要了解 cassandra 的工作原理,请查看:

http://prasath.posterous.com/cassandra-55

It's not possible to retrieve data from RDBMS for user friends data for data which cross more than half a billion at a constant time
so Facebook implemented this using a hash database (no SQL) and they opensourced the database called Cassandra.

So every user has its own key and the friends details in a queue; to know how cassandra works look at this:

http://prasath.posterous.com/cassandra-55

栖竹 2024-07-31 02:57:13

它是一种图形数据库:
http://components.neo4j.org/neo4j-examples/1.2 -SNAPSHOT/social-network.html

它与关系数据库无关。

谷歌图数据库。

Its a type of graph database:
http://components.neo4j.org/neo4j-examples/1.2-SNAPSHOT/social-network.html

Its not related to Relational databases.

Google for graph databases.

看透却不说透 2024-07-31 02:57:13

您正在寻找外键。 基本上,数据库中不能有数组,除非它有自己的表。


示例架构:

    Users Table
        userID PK
        other data
    Friends Table
        userID   -- FK to users's table representing the user that has a friend.
        friendID -- FK to Users' table representing the user id of the friend

You're looking for foreign keys. Basically you can't have an array in a database unless it has it's own table.


Example schema:

    Users Table
        userID PK
        other data
    Friends Table
        userID   -- FK to users's table representing the user that has a friend.
        friendID -- FK to Users' table representing the user id of the friend
堇年纸鸢 2024-07-31 02:57:13

大概有一个表,里面存储了朋友<-> 用户关系,例如“frnd_list”,具有字段“user_id”、“frnd_id”。

每当用户将另一个用户添加为好友时,就会创建两个新行。

例如,假设我的 id 是“deep9c”,并且我添加了一个 id 为“akash3b”的用户作为我的朋友,则在表“frnd_list”中创建了两个新行,其值为 ('deep9c','akash3b') 和 ('akash3b') ','deep9c')。

现在,当向特定用户显示好友列表时,一个简单的 sql 就可以做到这一点:“select frnd_id from frnd_list where user_id=”
其中是登录用户的 ID(存储为会话属性)。

Probably there is a table, which stores the friend <-> user relation, say "frnd_list", having fields 'user_id','frnd_id'.

Whenever a user adds another user as a friend, two new rows are created.

For instance, suppose my id is 'deep9c' and I add a user having id 'akash3b' as my friend, then two new rows are created in table "frnd_list" with values ('deep9c','akash3b') and ('akash3b','deep9c').

Now when showing the friends-list to a particular user, a simple sql would do that: "select frnd_id from frnd_list where user_id="
where is the id of the logged-in user (stored as a session-attribute).

怂人 2024-07-31 02:57:13

关于多对多表的性能,如果您有 2 个 32 位整数链接用户 ID,那么您的 200,000,000 个用户(平均每个用户有 200 个朋友)的基本数据存储空间略低于 300GB。

显然,您需要一些分区和索引,并且您不会将其保留在所有用户的内存中。

Regarding the performance of a many-to-many table, if you have 2 32-bit ints linking user IDs, your basic data storage for 200,000,000 users averaging 200 friends apiece is just under 300GB.

Obviously, you would need some partitioning and indexing and you're not going to keep that in memory for all users.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文