将一系列数字分成大致相等的总数

发布于 2024-10-01 21:31:43 字数 475 浏览 9 评论 0原文

我知道我的问题可能没有“完美”的解决方案（这听起来像是背包或装箱问题的变体），但这是我的场景：

我想将 SQL 数据库表列表分为n（假设是 7）个大小大致相同的堆（这样我就可以在整个星期内大致平均地分配一些维护任务）。

假设我有 100 个表（可能更高或更低，但不可能高于 5000），大小范围从 1 到 10,000,000（当然，较大的表不太常见）。

我最初的想法是按字母顺序（伪随机）对表进行排序，然后从头开始遍历，当总数超过 Sum(Size)/7 时移动到下一组。对于某些数据库，这可能工作得很好，但如果两个巨大的表彼此相邻，那么这会导致非常不平等的组。（这并不像听起来那么不可能，考虑两个巨大的表，Account_History 和 Account_History_Archive）。

是否有任何普遍接受的技术可以通过各种源数据给出“良好”的结果？我倾向于采用更简单的技术，而不是更精确的分组（如果某些日子的维护运行时间比其他日子稍长，那也没什么大不了的）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

淡水深流 2024-10-08 21:31:43

如何按大小对表进行排序，然后对于每个表，将其放入当前总行数最小的那一天？这意味着最大的 7 张桌子将首先分布在几天内。然后第八大的将与前 7 个中最小的一起使用，依此类推。您将继续用安排的最少工作量来度过这一天。

小参考表最终的结果可能没有多大区别。

你可以发明这种做法不好的场景，但我希望它在实践中能够发挥作用，而不会太复杂。

回复收藏 0 原文

单调的奢华 2024-10-08 21:31:43

仅供参考，这就是我的做法。我想将“桶”放入持久表中，并且每两周“重新计算”一次。否则，我担心如果我每天计算这些桶，一张表可能会从一个桶跳到下一个桶。但是，我想经常重新计算模式和结构。 DDL 修改。这是那个片段。

-------------------------------------------------------------------------------------
--Get the total table size (by rows)
-------------------------------------------------------------------------------------


if object_id('tempdb..#Space') is not null
drop table #Space

SELECT 
    TableName = t.NAME,
    Schem = s.name,
    Pages = sum(a.total_pages),
    Grp = row_number() over (order by sum(a.total_pages) desc)
INTO #Space
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN 
    sys.schemas s ON t.schema_id = s.schema_id
WHERE 
    t.NAME NOT LIKE 'dt%' 
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
GROUP BY 
    t.Name, s.name


-------------------------------------------------------------------------------------
--split the tables into 7 buckets by:
    --updating the Grp to the Grp with the lowest cumulative sum of all members by
    --ordering by the current cumulative sum of all members
-------------------------------------------------------------------------------------

declare @ct int = 8


while @ct <= (select max(Grp) from #Space)
begin

    update S
    set Grp = (select top 1 Grp from #Space where Grp < 8 order by sum(Pages) over (partition by Grp) asc)
    from #Space S
    where S.Grp = @ct

    set @ct = @ct + 1

end


insert into AdminTools..TableSpace (TableName
                                    ,Schem
                                    ,Pages
                                    ,Grp
                                    ,GrpPages
                                    ,LoadDate)
select 
    TableName
    ,Schem
    ,Pages
    ,Grp
    ,GrpPages = sum(Pages) over (partition by Grp)
    ,LoadDate = getdate()
from #Space
end

Just for reference, this is how I went about this. I wanted to put the "Buckets" into a persisted table and only "recompute" them every 2 weeks. Otherwise, I feared that if I computed these buckets every day, a table could jump from one bucket to the next. But, I wanted to recompute every so often for schema & DDL modifications. Here's that snippet.

-------------------------------------------------------------------------------------
--Get the total table size (by rows)
-------------------------------------------------------------------------------------


if object_id('tempdb..#Space') is not null
drop table #Space

SELECT 
    TableName = t.NAME,
    Schem = s.name,
    Pages = sum(a.total_pages),
    Grp = row_number() over (order by sum(a.total_pages) desc)
INTO #Space
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN 
    sys.schemas s ON t.schema_id = s.schema_id
WHERE 
    t.NAME NOT LIKE 'dt%' 
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
GROUP BY 
    t.Name, s.name


-------------------------------------------------------------------------------------
--split the tables into 7 buckets by:
    --updating the Grp to the Grp with the lowest cumulative sum of all members by
    --ordering by the current cumulative sum of all members
-------------------------------------------------------------------------------------

declare @ct int = 8


while @ct <= (select max(Grp) from #Space)
begin

    update S
    set Grp = (select top 1 Grp from #Space where Grp < 8 order by sum(Pages) over (partition by Grp) asc)
    from #Space S
    where S.Grp = @ct

    set @ct = @ct + 1

end


insert into AdminTools..TableSpace (TableName
                                    ,Schem
                                    ,Pages
                                    ,Grp
                                    ,GrpPages
                                    ,LoadDate)
select 
    TableName
    ,Schem
    ,Pages
    ,Grp
    ,GrpPages = sum(Pages) over (partition by Grp)
    ,LoadDate = getdate()
from #Space
end

回复收藏 0 原文