在社交网站上的 MongoDB 中存储帖子共享的有效方法?

发布于 2024-12-26 19:18:07 字数 928 浏览 0 评论 0原文

正如标题所示,假设您有一个帖子集合。帖子有一个 userId(作者)。其他用户可以分享该帖子。帖子也有标签,即帖子所属标签的 id 数组。如何存储它以便快速检索?

用例:您有联系。您可以看到来自您的联系人的帖子,或者您的联系人分享的帖子。帖子在页面上的排序依据是“速度”。共享帖子可以继承并保持原始速度,也可以根据自己的速度生存或消亡。不确定什么是最好的。

我考虑过的选项:

Post {id :uniquePostId, userId: authorId, shares: [userIds of those who shared], tagIds: [tagIds for post]}

此方法的问题:Mongo 不允许您索引两个数组。因此,如果您想同时查询 tagId 和共享,查询速度会非常慢。分别对两者进行索引会导致几乎全表扫描。

另一种选择:

像这样复制帖子:

Post {id: uniquePostId, userId: user who authored or shared the post, original: {postId: the original postId, or null if this is it, userId: the author of the original post}}

这种方法的问题:假设您想获取 20 个帖子,因此您在连接中查询 userId,如何处理连接中的重复共享?变得有点丑陋。

我读过的其他方法:

post: {
 shares_and_tags: [{type: share, id: 1}, {type: tag, id:4}, ...]
}

这似乎解决了索引问题,但我对 Mongo 的了解还不够,无法了解这里的性能影响。很快就会进行一些测试,但我想看看社区是否有任何建议或经验。谢谢!

As the title states, say you have a Posts collection. Post has a userId (the author). Another user can share the Post. Posts also have tags, an array of ids of tags they are categorized as. How to store this for quick retrieval?

Use case: You have connections. You see posts from your connections, or posts shared by your connections. Posts have a "velocity" they are ordered by on the page. Shared post could either inherit and keep velocity of original, or live or die by its own velocity. Not sure what's best.

Options I've considered:

Post {id :uniquePostId, userId: authorId, shares: [userIds of those who shared], tagIds: [tagIds for post]}

Problem with this method: Mongo doesn't let you index two arrays. So the query is slow as hell if you want to query on both tagIds and shares. Indexing both separately results in almost a full table scan.

Another Option:

You duplicate the Post like so:

Post {id: uniquePostId, userId: user who authored or shared the post, original: {postId: the original postId, or null if this is it, userId: the author of the original post}}

Problems with this approach: Say you want to fetch 20 posts, so you query on userId in your connections, how do you deal with duplicate shares in your connections? Gets kind of ugly.

Other approaches I've read:

post: {
 shares_and_tags: [{type: share, id: 1}, {type: tag, id:4}, ...]
}

This seems to resolve the indexing problems, but I don't know enough about Mongo to know the performance implications here. Gonna do some testing shortly, but figured I'd see if the community has any advice or experience. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

安人多梦 2025-01-02 19:18:07

好的,考虑到评论中的讨论:

这就是保存在 mongodb 中之后来自 twitter 流 API 的推文的样子,我从对象中删除了一些非必要数据以简化示例:

{
    "_id" : ObjectId("4f2849353ac01aebf231408a"),
    "place" : null,
    "text" : "tweet text",
    "created_at" : "Tue Jan 31 20:04:05 +0000 2012",
    "retweet_count" : 0,
    "favorited" : false,
    "source" : "<a href=\"http://mobile.twitter.com\" rel=\"nofollow\">Mobile Web</a>",
    "in_reply_to_screen_name" : null,
    "in_reply_to_user_id" : null,
    "retweeted" : false,
    "in_reply_to_status_id" : null,
    "in_reply_to_status_id_str" : null,
    "id_str" : "123456767800304",
    "user" : {
    },
    "truncated" : false,
    "id" : NumberLong("1234567890"),
    "in_reply_to_user_id_str" : null,
    "entities" : {
        "hashtags" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
    }
}

如您所见,每条推文都存储为新推文。如果这是一条转发,则转发标志将设置为 true,并具有其响应的帖子的 ID 以及响应顶级字段中引用的用户。

OK, given the discussion in the comments:

this is what a tweet looks like when it comes from twitter's streaming API after it is saved in mongodb, I've stripped some of the non-essential data out of the object to simplify the example:

{
    "_id" : ObjectId("4f2849353ac01aebf231408a"),
    "place" : null,
    "text" : "tweet text",
    "created_at" : "Tue Jan 31 20:04:05 +0000 2012",
    "retweet_count" : 0,
    "favorited" : false,
    "source" : "<a href=\"http://mobile.twitter.com\" rel=\"nofollow\">Mobile Web</a>",
    "in_reply_to_screen_name" : null,
    "in_reply_to_user_id" : null,
    "retweeted" : false,
    "in_reply_to_status_id" : null,
    "in_reply_to_status_id_str" : null,
    "id_str" : "123456767800304",
    "user" : {
    },
    "truncated" : false,
    "id" : NumberLong("1234567890"),
    "in_reply_to_user_id_str" : null,
    "entities" : {
        "hashtags" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
    }
}

As you can see, each tweet is stored as a new tweet. If this was a re-tweet, it would have the retweeted flag set to true and have the id of the post it was a response to as well as the user it was in response to referenced in the top level fields.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文