哪个性能更好?

发布于 2024-10-09 07:23:01 字数 629 浏览 0 评论 0原文

我正在考虑在社交网络应用程序的上下文中帖子及其评论的数据库模式,并且我在想这两者中哪一个会提供更好的性能:

我将帖子的评论存储在“评论”表中以及“帖子”表中的帖子。 现在,我的评论表架构如下所示:

postId commentId postsBy Date CommentBody

因为为了检索帖子的评论,我需要搜索其 postId 的所有帖子匹配这个特定帖子的 postId,甚至我的 postId 也不能成为主键,因为 postId 在列中不是唯一的(因为单个帖子有几条评论),因此我在想是否可以将 postId 和 commentId 合并到一个单一的commentId(这成为主键)使用它也可以检索postId。这就是我的想法:

CommentId 将生成为 postId*100+i (其中 i 是帖子的第 i 个评论)

因此,为了检索帖子的评论(比如 postId=8452 )我会搜索带有 commentId(这将是主键)的所有帖子,位于 845200 和 845200 之间。 845299..而不是用postId=8452..搜索所有评论(当然这将评论的最大数量限制为100)。但这会带来任何性能提升吗?

I am thinking of database schema for post and its comments, in context of a social networking application and im wandering which of these two would give better performance:

I am storing comments of a post in "Comments" Table and posts in the "Posts" Table.
Now my schema for the comments table looks like this:

postId commentId postedBy Date CommentBody

Since in order to retrieve the comments of a post I would be required to search all posts whose postId matches postId of this specific post and even my postId could not become primary key since the postId would be non unique within the column(since several comments for a single post), therefore I was thinking if I could merge postId and commentId into one single commentId (this becomes primary key) using which postId could also be retrieved. This is how I am thinking:

CommentId would be generated as postId*100+i (where i is the ith comment on the post)

thus in order to retrieve comments for a post(say with postId=8452 ) I would search all posts with commentId(that would be primary key), lying between 845200 & 845299.. instead of searching all comments with postId=8452.. (of course this limits the maximum no of comments to 100). But will this lead to any performance gains?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

玻璃人 2024-10-16 07:23:01

这就是你要做的。加载具有代表性数据的数据库,其大小(例如)是您期望的两倍。

然后运行查询并针对两个版本的架构进行测试。

然后,这是好的一点,每 X 周使用新的最新数据重新测试一次,以确保情况没有改变。

这就是 DBA 的意义所在。除非您的数据永远不会改变,否则数据库优化不是“一劳永逸”的操作。唯一确定的方法是在代表性条件下进行测试。

其他一切都是猜测。有根据的猜测,不要误会我的意思,但我宁愿有一个确定性的答案,而不是任何人的猜测,特别是因为前者会适应变化。

我最喜欢的优化口号是“测量,不要猜测!”

Here's what you do. Load up a database with representative data at (for example) twice the size you ever expect it to get.

Then run your queries and test them against both versions of the schema.

Then, and this is the good bit, retest this every X weeks with new up-to-date data to ensure the situation hasn't changed.

That's what being a DBA is all about. Unless your data will never change, database optimisation is not a set-and-forget operation. And the only way to be sure is to test under representative conditions.

Everything else is guesswork. Educated guesswork, don't get me wrong, but I'd rather have a deterministic answer in preference to anyone's guess, especially since the former will adapt to changes.

My favorite optimisation mantra is "Measure, don't guess!"

清风无影 2024-10-16 07:23:01

我建议:

  • 在注释中使用带有复合键的双表结构,以获得索引中的最佳唯一性。

  • 每篇文章 100 条评论是一个不好的限制,可能会打击您的后背。

  • 不要使用不同的表格来存储有关视频/图片等的评论。

  • 如果评论量很大,请添加评论存档表并移动旧评论
    那里。大多数请求的评论(最新)将有一个更小、更高效的表。

  • 请勿将 blob(图片和视频)保存在不同的分区上,而不是数据库中。数据库将更小,文件级别的碎片也更少。

问候,
/吨

I'd recommend:

  • Use two-table structure with composite key in comments for best uniquness in index.

  • 100 comments per article is a bad limition that may hit you in the back.

  • Dont use different tables for comments regarding video/pictures etc.

  • If huge amounts of comments, add an comment-archive table and move old comments
    there. Most requested comments (newest) will have a smaller and more efficient table.

  • Do save blobs (pictures and videos) on different partition and not in db. Db will be smaller and less fragmented at file level.

regards,
/t

驱逐舰岛风号 2024-10-16 07:23:01

如果你想要获得很大的容量,你应该制作一个表“帖子”和一个表“评论”,以便获得更小的表:)。并且不要忘记对它们使用索引和分区。

If you gonna get big volume you should make a table Post and a table Comments in order to have smaller table :). And don't forget to use index and partitions on them.

把回忆走一遍 2024-10-16 07:23:01

使用复合键。或者,如果您使用的框架仅允许单列键,则 postId 上的二级索引

Use a composite key. Or, if you're using some framework that only allows single-column keys, a secondary index on postId

一世旳自豪 2024-10-16 07:23:01

如果 CommendId 不唯一,您可以在 (postId, CommentID) 上创建复合 PRIMARY KEY

CREATE TABLE Comment
        (
        postId INT NOT NULL,
        commentId INT NOT NULL,
        …,
        PRIMARY KEY (postId, commentId)
        )

如果您的表是 MyISAM< /code>,您可以将 commentId 标记为 AUTO_INCRMENT,这将为它分配一个每个帖子 UNIQUE 递增值。

如果是唯一的,您可以在CommentId上创建一个PRIMARY KEY,并在(PostId, CommentId)上创建一个二级索引:

CREATE TABLE Comment
        (
        commentId INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        postId INT NOT NULL,
        …,
        KEY (postId, commentId)
        )

If CommendId is not unique, you can create a composite PRIMARY KEY on (postId, CommentID):

CREATE TABLE Comment
        (
        postId INT NOT NULL,
        commentId INT NOT NULL,
        …,
        PRIMARY KEY (postId, commentId)
        )

If your table is MyISAM, you can mark commentId as AUTO_INCREMENT, which will assign it with a per-post UNIQUE incrementing value.

If it is unique, you can create a PRIMARY KEY on CommentId and create a secondary index on (PostId, CommentId):

CREATE TABLE Comment
        (
        commentId INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        postId INT NOT NULL,
        …,
        KEY (postId, commentId)
        )
违心° 2024-10-16 07:23:01

CommentId 将生成为 postId*100+i(其中 i 是帖子的第 i 条评论)

因此,为了检索帖子的评论(例如 postId=8452 ),我将搜索带有 commentId(这将是主键)的所有帖子,位于 845200 和 845200 之间。 845299.. 而不是使用 postId=8452.. 搜索所有评论(当然这将评论的最大数量限制为 100)。但这会带来任何性能提升吗?

与基于 postId 外键列的查询相比,这可能会带来更差的性能,但唯一确定的方法是尝试这两种技术(如 paxdiablo 建议的那样)并测量性能。

CommentId would be generated as postId*100+i (where i is the ith comment on the post)

thus inorder to retrieve comments for a post(say with postId=8452 ) I would search all posts with commentId(that would be primary key), lying between 845200 & 845299.. instead of searching all comments with postId=8452.. (ofcourse this limits the maximum no of comments to 100). But will this lead to any performance gains ??

This will likely give much worse performance than a query based on a postId foreign key column, but the only way to be sure is to try both techniques (as suggested by paxdiablo) and measure the performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文