哪个性能更好?
我正在考虑在社交网络应用程序的上下文中帖子及其评论的数据库模式,并且我在想这两者中哪一个会提供更好的性能:
我将帖子的评论存储在“评论”表中以及“帖子”表中的帖子。 现在,我的评论表架构如下所示:
postId commentId postsBy Date CommentBody
因为为了检索帖子的评论,我需要搜索其 postId 的所有帖子匹配这个特定帖子的 postId,甚至我的 postId 也不能成为主键,因为 postId 在列中不是唯一的(因为单个帖子有几条评论),因此我在想是否可以将 postId 和 commentId 合并到一个单一的commentId(这成为主键)使用它也可以检索postId。这就是我的想法:
CommentId 将生成为 postId*100+i (其中 i 是帖子的第 i 个评论)
因此,为了检索帖子的评论(比如 postId=8452 )我会搜索带有 commentId(这将是主键)的所有帖子,位于 845200 和 845200 之间。 845299..而不是用postId=8452..搜索所有评论(当然这将评论的最大数量限制为100)。但这会带来任何性能提升吗?
I am thinking of database schema for post and its comments, in context of a social networking application and im wandering which of these two would give better performance:
I am storing comments of a post in "Comments" Table and posts in the "Posts" Table.
Now my schema for the comments table looks like this:
postId commentId postedBy Date CommentBody
Since in order to retrieve the comments of a post I would be required to search all posts whose postId matches postId of this specific post and even my postId could not become primary key since the postId would be non unique within the column(since several comments for a single post), therefore I was thinking if I could merge postId and commentId into one single commentId (this becomes primary key) using which postId could also be retrieved. This is how I am thinking:
CommentId would be generated as postId*100+i (where i is the ith comment on the post)
thus in order to retrieve comments for a post(say with postId=8452 ) I would search all posts with commentId(that would be primary key), lying between 845200 & 845299.. instead of searching all comments with postId=8452.. (of course this limits the maximum no of comments to 100). But will this lead to any performance gains?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这就是你要做的。加载具有代表性数据的数据库,其大小(例如)是您期望的两倍。
然后运行查询并针对两个版本的架构进行测试。
然后,这是好的一点,每 X 周使用新的最新数据重新测试一次,以确保情况没有改变。
这就是 DBA 的意义所在。除非您的数据永远不会改变,否则数据库优化不是“一劳永逸”的操作。唯一确定的方法是在代表性条件下进行测试。
其他一切都是猜测。有根据的猜测,不要误会我的意思,但我宁愿有一个确定性的答案,而不是任何人的猜测,特别是因为前者会适应变化。
我最喜欢的优化口号是“测量,不要猜测!”
Here's what you do. Load up a database with representative data at (for example) twice the size you ever expect it to get.
Then run your queries and test them against both versions of the schema.
Then, and this is the good bit, retest this every
X
weeks with new up-to-date data to ensure the situation hasn't changed.That's what being a DBA is all about. Unless your data will never change, database optimisation is not a set-and-forget operation. And the only way to be sure is to test under representative conditions.
Everything else is guesswork. Educated guesswork, don't get me wrong, but I'd rather have a deterministic answer in preference to anyone's guess, especially since the former will adapt to changes.
My favorite optimisation mantra is "Measure, don't guess!"
我建议:
在注释中使用带有复合键的双表结构,以获得索引中的最佳唯一性。
每篇文章 100 条评论是一个不好的限制,可能会打击您的后背。
不要使用不同的表格来存储有关视频/图片等的评论。
如果评论量很大,请添加评论存档表并移动旧评论
那里。大多数请求的评论(最新)将有一个更小、更高效的表。
请勿将 blob(图片和视频)保存在不同的分区上,而不是数据库中。数据库将更小,文件级别的碎片也更少。
问候,
/吨
I'd recommend:
Use two-table structure with composite key in comments for best uniquness in index.
100 comments per article is a bad limition that may hit you in the back.
Dont use different tables for comments regarding video/pictures etc.
If huge amounts of comments, add an comment-archive table and move old comments
there. Most requested comments (newest) will have a smaller and more efficient table.
Do save blobs (pictures and videos) on different partition and not in db. Db will be smaller and less fragmented at file level.
regards,
/t
如果你想要获得很大的容量,你应该制作一个表“帖子”和一个表“评论”,以便获得更小的表:)。并且不要忘记对它们使用索引和分区。
If you gonna get big volume you should make a table Post and a table Comments in order to have smaller table :). And don't forget to use index and partitions on them.
使用复合键。或者,如果您使用的框架仅允许单列键,则 postId 上的二级索引
Use a composite key. Or, if you're using some framework that only allows single-column keys, a secondary index on postId
如果
CommendId
不唯一,您可以在(postId, CommentID)
上创建复合PRIMARY KEY
:如果您的表是
MyISAM< /code>,您可以将
commentId
标记为AUTO_INCRMENT
,这将为它分配一个每个帖子UNIQUE
递增值。如果是唯一的,您可以在
CommentId
上创建一个PRIMARY KEY
,并在(PostId, CommentId)
上创建一个二级索引:If
CommendId
is not unique, you can create a compositePRIMARY KEY
on(postId, CommentID)
:If your table is
MyISAM
, you can markcommentId
asAUTO_INCREMENT
, which will assign it with a per-postUNIQUE
incrementing value.If it is unique, you can create a
PRIMARY KEY
onCommentId
and create a secondary index on(PostId, CommentId)
:与基于 postId 外键列的查询相比,这可能会带来更差的性能,但唯一确定的方法是尝试这两种技术(如 paxdiablo 建议的那样)并测量性能。
This will likely give much worse performance than a query based on a postId foreign key column, but the only way to be sure is to try both techniques (as suggested by paxdiablo) and measure the performance.