高性能 wiki 架构

发布于 2024-07-26 11:16:27 字数 904 浏览 3 评论 0原文

我正在使用 MS SQL Server 2005。

类似 Wiki 的系统的最佳架构是什么? 用户编辑/修改提交内容,系统会跟踪这些提交内容。

假设我们正在做一个简单的基于 wiki 的系统。 将跟踪每个修订以及每个修订的视图和最新活动。 在其他屏幕中,系统将列出“最新提交”和“观看次数最多”,以及按标题搜索。

我当前的模式(我知道它不好)正在使用单个表。 当我需要查看“最新提交”时,我按“LatestActivity”排序,按“DocumentTitle”分组,然后获取前 N 条记录。 我认为大量分组(尤其是在 nvarchar 上分组)是坏消息。 为了列出观看次数最多的内容,我也做了同样的事情:按观看次数排序,按名称分组,获取前 N 条记录。 大多数时候,我还会执行“WHERE DocumentName LIKE '%QUERY-HERE%'”。

我当前的架构是“版本 1”,如下所示: 替代文本 http://www.anaimi.com/junk/schemaquestion.png

我假设这是不可接受的。 所以我正在尝试提出另一种/性能更高的设计。 您觉得版本 2 怎么样? 在第二版中,我得到了对 WikiHeadId 进行分组的优势,它是一个数字 - 我假设对数字进行分组比 nvarchar 更好。

或者是版本 3 的极端情况,我不会进行分组,但有几个缺点,例如重复值、在代码中维护这些值等。

或者对于此类系统是否有更好/已知的模式?

谢谢。

(从 ServerFault 移过来 - 我认为它是一个开发问题,而不是一个 IT 问题)

I'm using MS SQL Server 2005.

What's the best schema for a Wiki-like system? where users edit/revise a submission and the system keeps track of these submissions.

Lets say we're doing a simple wiki-based system. Will keep track of each revision plus views and latest activity of each revision. In other screens, the system will list "Latest Submissions" and "Most Viewed", plus search by title.

My current schema (and I know its bad) is using a single table. When I need to see the "Latest Submissions" I sort by "LatestActivity", group by "DocumentTitle", then take first N records. I assume a lot of grouping (especially grouping on nvarchar) is bad news. For listing the most viewed I also do the same: sort by views, group by name, take first N records. Most of the time, I will also be doing a "WHERE DocumentName LIKE '%QUERY-HERE%'".

My current schema is "Version 1", see below:
alt text http://www.anaimi.com/junk/schemaquestion.png

I assume this is not acceptable. So i'm trying to come up with another/more-performant design. How does Version 2 sound to you? In version two I get the advantage of grouping on WikiHeadId which is a number - i'm assuming grouping over a number is better than nvarchar.

Or the extreme case which is version 3, where I will do no grouping, but has several disadvantages such as duplicating values, maintaining these values in code, etc.

Or is there a better/known schema for such systems?

Thanks.

(moved from ServerFault - i think its a development question more than an IT question)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清浅ˋ旧时光 2024-08-02 11:16:27

首先(出于好奇)当前模式如何指示当前版本是什么? 您是否只有多个具有相同文档标题的“WikiDocument”条目?

我也不清楚为什么您需要版本级别的“LastActivity”。 我不明白“LastActivity”如何与“版本”的概念相适应——在大多数 wiki 中,“版本”是一次性写入的:如果您修改版本,那么您就是创建一个版本,因此该版本的最后更新类型值的概念是没有意义的——它实际上只是“创建日期”。

实际上,您的设计的“自然”模式是#2。 就我个人而言,我有点喜欢古老的数据库公理“规范化直到它受伤,然后非规范化直到它起作用”。 #2 是一个更干净、更好的设计(简单,没有重复),如果您没有紧急原因非规范化到版本 3,我不会打扰。

最终,问题归结为:您担心“更高性能”的设计是因为您观察到了性能问题,还是因为您假设可能有一些性能问题? 没有真正的理由#2 不应该表现良好。 在 SQL Server 中,分组不一定是坏消息——事实上,如果查询有适当的覆盖索引,它的性能会非常好,因为它可以导航到索引中的特定级别来查找分组值,然后使用索引的其余列用于 MIN/MAX/其他。 按 NVARCHAR 分组并不是特别糟糕 - 如果没有观察到它是一个问题,请不要担心它,尽管(非二进制)排序规则可能会使它有点棘手 - 但在版本 2 中,您需要GROUP BY 你可以通过 WikiHeadId 来完成,对吧?

一件可能让生活变得更轻松的事情是,如果您对当前版本进行大量操作(正如我假设的那样),将 FK 从头表添加回主体表,指示当前版本。 如果您想查看点击次数最高的当前版本,那么现在的#2 可能是:

SELECT TOP ...
FROM WikiHead
INNER JOIN 
  (SELECT WikiHeadId, MAX(WikiBodyVersion) /* or LastUpdated? */ AS Latest 
   FROM WikiBody GROUP BY WikiHeadId) AS LatestVersions
INNER JOIN WikiBody ON 
  (Latest.WikiHeadId = WikiBody.WikiHeadId)
  AND (WikiBody.WikiBodyVersion = LatestVersions.Latest)
ORDER BY 
  Views DESC

或者

...
INNER JOIN WikiBody ON 
  (WikiHead.WikiHeadId = WikiBody.WikiHeadId)
  AND (WikiBody.WikiBodyVersion = 
    (SELECT MAX(WikiBodyVersion) FROM WikiBody WHERE WikiBody.WikiHeadId = WikiHead.WikiHeadId)
...

两者都令人讨厌。 如果 WikiHead 保留指向当前版本的指针,那么这

...    
INNER JOIN WikiBody ON 
  (WikiHead.WikiHeadId = WikiBody.WikiHeadId)
  AND (WikiHead.Latest = WikiBody.WikiBodyVersion)
...

可能是一个有用的非规范化,只是因为它使您的生活更轻松,而不是为了性能。

Firstly (and out of curiosity) how does the current schema indicate what the current version is? Do you just have multiple 'WikiDocument' entries with the same DocumentTitle?

I'm also not clear on why you need a 'LastActivity' at a Version level. I don't see how 'LastActivity' fits with the concept of a 'Version' -- in most wikis, the 'versions' are write-once: if you modify a version, then you're creating a new version, so the concept of a last-updated type value on the version is meaningless -- it's really just 'datecreated'.

Really, the 'natural' schema for your design is #2. Personally, I'm a bit of a fan of the old DB axiom 'normalize until it hurts, then denormalize until it works'. #2 is a cleaner, nicer design (simple, with no duplication), and if you have no urgent reason to denormalize to version 3, I wouldn't bother.

Ultimately, it comes down to this: are you worrying about 'more performant' design because you've observed performance problems, or because you hypothetically might have some? There's no real reason #2 shouldn't perform well. Grouping isn't necessarily bad news in SQL Server -- in fact, if there's an appropriate covering index for the query, it can perform extremely well because it can just navigate to a particular level in the index to find the grouped values, then use the remaining columns of the index to use to MIN/MAX/whatever. Grouping by NVARCHAR isn't particularly bad -- if it's not observed to be a problem, don't fret about it, though (non-binary) collations can make it a little tricky -- but in version 2, where you need to GROUP BY you can do it by WikiHeadId, right?

One thing that may make life easier, if you do a lot of operations on the current version (as I assume you would), to add an FK back from the head table to the body table, indicating the current version. If you want to view the current versions with the highest number of hits, with #2 as it stands now it might be:

SELECT TOP ...
FROM WikiHead
INNER JOIN 
  (SELECT WikiHeadId, MAX(WikiBodyVersion) /* or LastUpdated? */ AS Latest 
   FROM WikiBody GROUP BY WikiHeadId) AS LatestVersions
INNER JOIN WikiBody ON 
  (Latest.WikiHeadId = WikiBody.WikiHeadId)
  AND (WikiBody.WikiBodyVersion = LatestVersions.Latest)
ORDER BY 
  Views DESC

or alternatively

...
INNER JOIN WikiBody ON 
  (WikiHead.WikiHeadId = WikiBody.WikiHeadId)
  AND (WikiBody.WikiBodyVersion = 
    (SELECT MAX(WikiBodyVersion) FROM WikiBody WHERE WikiBody.WikiHeadId = WikiHead.WikiHeadId)
...

both of which are icky. If the WikiHead keeps a pointer to the current version, it's just

...    
INNER JOIN WikiBody ON 
  (WikiHead.WikiHeadId = WikiBody.WikiHeadId)
  AND (WikiHead.Latest = WikiBody.WikiBodyVersion)
...

or whatever, which may be a useful denormalization just because it makes your life easier, not for performance.

凶凌 2024-08-02 11:16:27

检查这个

它是维基百科所基于的 mediawiki 的数据库架构。

它看起来记录得很好,并且对您来说会是一本有趣的读物。

从这个页面

Check this out.

It's the database schema for mediawiki, what wikipedia is based on.

It looks pretty well documented and would be an interesting read for you.

From this page.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文