Facebook“喜欢”数据结构
我一直想知道 Facebook 如何管理您可以“喜欢”的所有不同事物的数据库设计。如果只有一件事是喜欢的,这很简单,只是你喜欢什么的外键和你是谁的外键。
但一定有数百张不同的桌子你可以在 Facebook 上“喜欢”。他们如何存储点赞?
I've been wondering how facebook manages the database design for all the different things that you can "like". If there is only one thing to like, this is simple, just a foreign key to what you like and a foreign key to who you are.
But there must be hundreds of different tables that you can "like" on facebook. How do they store the likes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您想在关系数据库中表示这种结构,那么您需要使用通常称为表继承的层次结构。在表继承中,您有一个定义父类型的表,然后是子表,其主键也是返回父级的外键。
使用 Facebook 示例,您可能会得到如下结果:
出于完整性考虑,值得注意的是 Facebook 不使用 RDBMS 来处理此类事情。他们为此类存储选择了 NoSQL 解决方案。然而,这是在 RDBMS 中存储此类松散耦合信息的一种方法。
If you want to represent this sort of structure in a relational database, then you need to use a hierarchy normally referred to as table inheritance. In table inheritance, you have a single table that defines a parent type, then child tables whose primary keys are also foreign keys back to the parent.
Using the Facebook example, you might have something like this:
In the interest completeness, it's worth noting that Facebook doesn't use an RDBMS for this sort of thing. They have opted for a NoSQL solution for this sort of storage. However, this is one way of storing such loosely-coupled information within an RDBMS.
Facebook 没有传统的外键等,因为他们不使用关系数据库来存储大部分数据。简单地说,他们不会为此而削减。
然而,他们使用多种 NoSQL 类型的数据存储。 “喜欢”很可能基于服务,可能在整个基础设施中以 SOA 风格的方式设置。这样,“喜欢”基本上可以归因于他们希望与之相关的任何事物。所有这一切都具有巨大的可扩展性,并且无需处理紧密耦合的关系问题。以 Facebook 的运营量来说,这是无法真正承受的。
他们还可以使用 AOP(面向方面编程)风格的处理机制,将“Like”“附加”到页面渲染时可能需要的任何内容,但我认为这是通过 JavaScript 针对 SOA 风格进行异步处理网络服务或其他交付机制。
不管怎样,我很想听听他们自己是如何从架构角度进行这种设置的。考虑到它们的体积,即使是简单的“喜欢”按钮也成为了技术的重要实现。
Facebook does not have traditional foreign keys and such, as they don't use relational databases for most of their data storage. Simply, they don't cut it for that.
However they use several NoSQL type data stores. The "Like" is most likely attributed based on a service, probably setup in an SOA style manner throughout their infrastructure. This way the "Like" can basically be attributed to anything they want it to be associated with. All this, with vast scalability and no tightly coupled relational issues to deal with. Something that Facebook, can't really afford to deal with at the volume they operate.
They could also be using an AOP (Aspect Oriented Programming) style processing mechanism to "attach" a "Like" to anything that may need one at page rendering time, but I get the notion that it is asynchronous processing via JavaScript against an SOA style web service or other delivery mechanism.
Either way, I'd love to hear how they have this setup from an architecture perspective myself. Considering their volume, even the simple "Like" button becomes a significant implementation of technology.
您可以拥有一个包含 Id、ForeignId 和 Type 的表。类型可以是照片、状态、事件等任何内容……ForeignId 是类型表中记录的 ID。这使得评论和点赞成为可能。您只需要一张表来容纳所有点赞、一张表来容纳所有评论和我所描述的那张表。
示例:
此处,ID 为 111 的用户喜欢 ID 为 322 的照片。
注意:我假设您使用的是 RDBMS,但请参阅 Adron 的答案。 Facebook 不使用 RDBMS 来处理大部分数据。
You can have a table with Id, ForeignId and Type. Type can be anything like Photo, Status, Event, etc… ForeignId would be the id of the record in the table Type. This makes possible for both comments and likes. You only need one table for all likes, one for all comments and the one I described.
Example:
Here, user with Id 111 likes the photo with Id 322.
Note: I assume you are using an RDBMS, but see Adron's answer. Facebook does not use an RDBMS for most of their data.
我很确定 Facebook 不会像其他人建议的那样使用 RDBMS 存储“类似”信息。拥有数百万用户,可能还有数千个赞,我们正在考虑将数千行加入此处,这会影响性能。
这里最好的方法是将所有“喜欢”附加在一行中。例如,具有文本数据类型的 user_like_id 列的表。然后所有喜欢该帖子的 id 都会被附加。在这种情况下,您只需查询一行即可获得所有内容。这将比连接表和获取计数快得多。
编辑:我最近没有来过这个网站,我刚刚发现这个答案已被否决。好吧,这是一个 带有点赞数及其的示例帖子头像。这是我的设计,我刚刚实现了我所说的内容。
这里的两个组件是 1.) XREF 表和 2.) JSON 对象。
点赞数仍然存储在 XREF 表中。但同时,数据会附加到 JSON 对象上并存储在 post 表的文本列中。
为什么我将点赞信息以 JSON 形式存储在文本列上?这样就不需要进行数据库查找/连接等操作。如果有人不喜欢这篇文章,JSON 对象就会被更新。
现在我不知道为什么这个答案被一些用户否决了。该答案提供快速数据检索。这与 FB 访问数据的 NoSQL 方法很接近。在这种情况下,不需要额外的连接/查找来获取喜欢信息。
这是保存点赞数的表格。它只是用户和项目表之间的简单 XREF 映射。
I'm pretty sure Facebook does not store "like" information as how some other suggested it using RDBMS. With millions of users and possibly thousands of like, we're looking at thousands of rows to join here which would impact performance.
The best approach here is to append all "likes" in a single row. For example, a table with user_like_id column of text datatype. Then all id's who liked the post is appended. In this case, you only query one row and you got everything. This will be a lot faster than joining tables and getting counts.
EDIT: I haven't been here on this site lately and I just discovered this answer has been downvoted. Well, here's an example post with like count and their avatars. This is my design where I just implemented what I'm talking about.
The two components here are 1.) XREF table and 2.) JSON object.
The likes are still stored on a XREF table. But at the same time, data is appended on JSON object and stored on a text column on the post table.
Why did I store the likes info on a text column as JSON? So that there's no need to do db lookup/joins for the likes. If someone unlike the post, the JSON object is just updated.
Now I don't know why this answer is downvoted by some users here. This answer provides quick data retrieval. This is close to NoSQL approach which is how FB access data. In this case, there's no need for extra joins/lookup to get likes info.
And here's the table that holds the likes. It's just a simple XREF mapping between user and item table.