构建一个汇总另一个表数据的表 - 优化建议?

发布于 2024-11-27 06:49:36 字数 1340 浏览 3 评论 0原文

我正在尝试建立一个表格来总结我们的 Wiki 中的页面引用某些宏的次数。

我构建了一个名为 @currentpages 的临时表,其中有 55k 行,如下所示:

DECLARE @currentpages table(
ContentID       NUMERIC(19,0) NOT NULL PRIMARY KEY,
PageTitle       VARCHAR(255) NULL,
SpaceKey            VARCHAR(255) NULL,
OriginalAuthor  VARCHAR(255) NULL,
LastChangedBy   VARCHAR(255) NULL,
LastChangedDt   VARCHAR(10) NULL,
ContentBody         TEXT NULL); 

另一个表如下所示:(

DECLARE @usage table(
SpaceKey        VARCHAR(255) NOT NULL PRIMARY KEY,
Macro1      NUMERIC(19,0) NULL,
Macro2      NUMERIC(19,0) NULL,
Macro3              NUMERIC(19,0) NULL);

我简化了上面的内容;它实际上有大约 40 列,如 Macro1、Macro2 等)我'我试图计算有多少个对这些不同宏的引用(在@currentpages中)。

我通过为每个可能的 SpaceKey 值创建一行来初始化 @usage 表,将所有宏“x”计数器清零:

INSERT INTO @usage (SpaceKey, Macro1, Macro2, Macro3)
SELECT S.spacekey, 0, 0, 0
FROM spaces     S
ORDER BY S.spacekey

然后我运行几个 UPDATE 语句中的第一个来识别每个宏引用:

UPDATE @usage
SET U.AdvancedSearch = C.Counter
FROM @usage                 U
INNER JOIN (SELECT SpaceKey, COUNT(*) AS Counter 
    FROM @currentpages 
    WHERE ContentBody LIKE '%{search-%' GROUP BY SpaceKey) C 
        ON U.SpaceKey = C.SpaceKey

这似乎工作正常,但它运行很长时间。有没有更有效的方法来完成我想做的事情?

该数据库位于 SQL Server 2005 上。

非常感谢, 贝特西

I am trying to build a table that summarizes how many times pages in our Wiki references certain macros.

I have built a temporary table called @currentpages that with 55k rows that looks like this:

DECLARE @currentpages table(
ContentID       NUMERIC(19,0) NOT NULL PRIMARY KEY,
PageTitle       VARCHAR(255) NULL,
SpaceKey            VARCHAR(255) NULL,
OriginalAuthor  VARCHAR(255) NULL,
LastChangedBy   VARCHAR(255) NULL,
LastChangedDt   VARCHAR(10) NULL,
ContentBody         TEXT NULL); 

and another table that looks like this:

DECLARE @usage table(
SpaceKey        VARCHAR(255) NOT NULL PRIMARY KEY,
Macro1      NUMERIC(19,0) NULL,
Macro2      NUMERIC(19,0) NULL,
Macro3              NUMERIC(19,0) NULL);

(I've simplified the above; it actually has about 40 columns like Macro1, Macro2, etc.) I'm trying to get counts of how many references there are (in @currentpages) to these various macros.

I initialize the @usage table by creating a row for each possible SpaceKey value, zeroing out all of the Macro"x" counters:

INSERT INTO @usage (SpaceKey, Macro1, Macro2, Macro3)
SELECT S.spacekey, 0, 0, 0
FROM spaces     S
ORDER BY S.spacekey

Then I run the first of several UPDATE statements to identify each macro reference:

UPDATE @usage
SET U.AdvancedSearch = C.Counter
FROM @usage                 U
INNER JOIN (SELECT SpaceKey, COUNT(*) AS Counter 
    FROM @currentpages 
    WHERE ContentBody LIKE '%{search-%' GROUP BY SpaceKey) C 
        ON U.SpaceKey = C.SpaceKey

This appears to work fine, but it runs a very long time. Is there a more efficient way to do what I'm trying to do?

This database is on SQL Server 2005.

Many thanks,
Betsy

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

话少情深 2024-12-04 06:49:36
  1. 调查全文搜索。 (全文搜索 @ MSDN
    我对此没有任何经验,因此无法提供建议。

  2. 不要为每个宏查询一次@CurrentPages,而是一次获取所有宏计数。像这样的东西:

    INSERT INTO @usage (SpaceKey, Macro1, Macro2, Macro3)
    SELECT S.空格键
        , count(ContentBody LIKE '%Search Macro 1%' 时的情况 
            然后 1 else null end) 作为 Macro1_Count
        , count(ContentBody LIKE '%Search Macro 2%' 时的情况 
            然后 1 else null end) 作为 Macro2_Count
        , count(ContentBody LIKE '%Search Macro 3%' 时的情况 
            然后 1 else null end) 作为 Macro=3Count
    从空间 S
    左外连接 @CurrentPages C ON C.SpaceKey = S.SpaceKey
    按 S.spacekey 分组
    
  1. Investigate full text search. (Full Text Search @ MSDN)
    I don't have any experience with it, so can not offer advice.

  2. Instead of querying @CurrentPages once per macro, get all the macro counts at once. Something like:

    INSERT INTO @usage (SpaceKey, Macro1, Macro2, Macro3)
    SELECT S.spacekey
        , count(case when ContentBody LIKE '%Search Macro 1%' 
            then 1 else null end) as Macro1_Count
        , count(case when ContentBody LIKE '%Search Macro 2%' 
            then 1 else null end) as Macro2_Count
        , count(case when ContentBody LIKE '%Search Macro 3%' 
            then 1 else null end) as Macro=3Count
    FROM spaces S
    LEFT OUTER JOIN @CurrentPages C ON C.SpaceKey = S.SpaceKey
    GROUP BY S.spacekey
    
擦肩而过的背影 2024-12-04 06:49:36

不要为每个宏创建一列,而是创建一个像这样的表。

DECLARE @macrotype table(
    type      int          NOT NULL PRIMARY KEY
  , MacroName varchar(100) NOT NULL 
  , mask      varchar(100) NOT NULL 
)

DECLARE @usage table(
  SpaceKey    VARCHAR(255) NOT NULL PRIMARY KEY,
, MacroType   int NOT NULL 
, MacroCount  int NOT NULL 
)

一旦你有了它,你就可以通过运行来填充它。

  INSERT INTO @macrotype VALUES (1, 'search', '%{search-%')
                              , (2, 'yadayada', '%{yadayada-%')
                              , ...

  DECLARE @c int, @m varchar(100)
  DECLARE c CURSOR READ_ONLY FOR SELECT type, mask FROM @macrotype
  OPEN c

  FETCH NEXT FROM c INTO @t, @m

  WHILE @@FETCH_STATUS = 0 
    BEGIN
      INSERT INTO @usage (SpaceKey, MacroType, MacroCount)
      SELECT SpaceKey
           , @t
           , COUNT(*) AS Counter 
        FROM @currentpages
       WHERE ContentBody LIKE @m
    GROUP BY SpaceKey
    FETCH NEXT FROM c INTO @t, @m
  END
  CLOSE c
  DEALLOCATE c

在你的故事结束时,你可以查询@usage表,以你想要的任何形式进行计数。

  SELECT MacroName, count = count(*)
    FROM @usage u
    JOIN @macrotype m
      ON u.MacroType - m.type
GROUP BY MacroName

对于 55k 条记录,这将运行几分钟。

Don't make a column per macro, but make a table like this.

DECLARE @macrotype table(
    type      int          NOT NULL PRIMARY KEY
  , MacroName varchar(100) NOT NULL 
  , mask      varchar(100) NOT NULL 
)

DECLARE @usage table(
  SpaceKey    VARCHAR(255) NOT NULL PRIMARY KEY,
, MacroType   int NOT NULL 
, MacroCount  int NOT NULL 
)

Once you have that you can populate it by running

  INSERT INTO @macrotype VALUES (1, 'search', '%{search-%')
                              , (2, 'yadayada', '%{yadayada-%')
                              , ...

  DECLARE @c int, @m varchar(100)
  DECLARE c CURSOR READ_ONLY FOR SELECT type, mask FROM @macrotype
  OPEN c

  FETCH NEXT FROM c INTO @t, @m

  WHILE @@FETCH_STATUS = 0 
    BEGIN
      INSERT INTO @usage (SpaceKey, MacroType, MacroCount)
      SELECT SpaceKey
           , @t
           , COUNT(*) AS Counter 
        FROM @currentpages
       WHERE ContentBody LIKE @m
    GROUP BY SpaceKey
    FETCH NEXT FROM c INTO @t, @m
  END
  CLOSE c
  DEALLOCATE c

At the end of your story you can query the @usage table to make counts in whatever form you wish.

  SELECT MacroName, count = count(*)
    FROM @usage u
    JOIN @macrotype m
      ON u.MacroType - m.type
GROUP BY MacroName

On 55k records this will run for a few minutes.

帝王念 2024-12-04 06:49:36

我想任何性能影响都将来自包含 LIKE 子句的子查询,因为它将进行全表扫描。我真的不知道还有什么会减慢速度太多。

您可以通过仅运行子查询并将返回所需的时间与某一列的整个更新所需的时间进行比较来进行测试。

I would imagine any performance hits are going to come from the subquery containing the LIKE clause since it's going to be doing a full table scan. I don't really see what else would slow this down too much.

You could test that by running just the sub-query and comparing the time it takes to return to the time the entire update takes for one of your columns.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文