PowerShell命令或脚本指南根据数据量均匀地对目录进行排序

发布于 2025-01-19 21:08:55 字数 406 浏览 0 评论 0原文

我正在将文件服务器从本地迁移到 Azure 文件共享/VM。

我已准备好一切并准备好移动,但我想将我的文件分成 4 个大小大致相等的批次。

**问题 **

如果我的文件服务器上有 100 个充满数据的目录,名为 Dir1 - Dir100。

什么命令可以帮助我弄清楚如下内容:

Dir_1 – Dir_30 == 数据总量的 25%

Dir_31 -- Dir_65 == 数据总量的 25%

Dir_66 – Dir_90 == 数据总量的 25%

Dir_91 – dir_100 == 数据总量的25%

这有意义吗?

我知道如何获取总数据大小或文件数量......但我无法弄清楚我想要做的事情是否可能或如何做。我一直在胡闹,但我什至没有接近。

I am in the process of moving my file server from on-prem up to an Azure file share/VM.

I've got everything in place and ready to move, but I am wanting to divide my files up and do it in 4 roughly equal sized batches.

**The Question **

If I had 100 directories on my fileserver full of data, named Dir1 - Dir100.

What command would help me figure out something like below:

Dir_1 – Dir_30 == 25% of total amount of data

Dir_31 -- Dir_65 == 25% of total amount of data

Dir_66 – Dir_90 == 25% of total amount of data

Dir_91 – Dir_100 == 25% of total amount of data

Does this make sense?

I know how to get total data size or number of files.... But I cannot figure out if what I am trying to do is possible or how to do it. I have been messing around, but I'm not even getting close.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

彩虹直至黑白 2025-01-26 21:08:55

好吧,所以我两种方式都做了。您没有发布自己的任何内容,因此您必须适应我在这里发布的内容。

首先,创建目录列表。这里的两个分区字段只是它们所属的组。您将需要根据您的评论进行顺序分区。

# Set the number of partitions you need to divide into
$PartitonCount = 4

# I'm making my own list.
# DirectoryName is the FullName of the directory, while Size is the total size of the folder
$Directories = @'
DirectoryName,Size,SequentialPartition,BalancedPartition
Dir001,667117278790,,
Dir002,292429698039,,
Dir003,886665781748,,
Dir004,49665832174,,
Dir005,34041573768,,
Dir006,320236552339,,
Dir007,747674470078,,
Dir008,375284137393,,
Dir009,549754879999,,
Dir010,327528841615,,
Dir011,1079085662940,,
Dir012,1051279115201,,
Dir013,198772106622,,
Dir014,124437323951,,
Dir015,342261556929,,
Dir016,844330660560,,
Dir017,888294129196,,
Dir018,774656795794,,
Dir019,360019686543,,
Dir020,412884330229,,
'@ | ConvertFrom-Csv

现在,对于顺序分区:

# Determine the total size of all directories
$TotalSize = $Directories | Measure-Object -Property Size -Sum | Select-Object -ExpandProperty Sum

# This is the threshold for sequential balancing
$PartitionSizeThreshold = $TotalSize / $PartitonCount

# Initialize the partition size and partition ID
[int64]$CurrentPartitionSize = 0
$CurrentPartition = 0
foreach ($D in $Directories) {
    # Assign the file to the current partition
    $D.SequentialPartition = $CurrentPartition

    # Add the current file's size to the current partition's size
    $CurrentPartitionSize += $D.Size

    # If the current partition's size is over the threshold, go to the next empty partition
    if ($CurrentPartitionSize -gt $PartitionSizeThreshold) {
        $CurrentPartition++
        $CurrentPartitionSize = 0
    }
}

# Results
$Directories

您可以通过以下方式看到它的平衡程度:

# Here is the breakdown of the sequential partitions as a fraction of the total
$Directories |
    Group-Object -Property SequentialPartition |
    ForEach-Object {
         ($_.Group | Measure-Object -Property Size -Sum | Select-Object -ExpandProperty Sum) / $TotalSize 
    }

最后一个分区可能非常不平衡。


另一方面,如果它需要尽可能平衡:

# Create the partiton size array with a zero for each partition size
$BalancedPartitionSizes = 1..$PartitonCount | ForEach-Object { 0 }

# Sort largest to smallest to assign the largest directories first
$Directories | Sort-Object -Property Size -Descending |
    ForEach-Object {
        # Determine which index of the array has is the smallest size
        $SmallestPartition = $BalancedPartitionSizes | Sort-Object | Select-Object -First 1 | ForEach-Object { [Array]::IndexOf($BalancedPartitionSizes, $_) }

        # Add the directory to the smallest partition
        $BalancedPartitionSizes[$SmallestPartition] += $_.Size
        $_.BalancedPartition = $SmallestPartition
    }

# Results
$Directories | Sort-Object -Property BalancedPartition

并且您可以看到它平衡事物的效率:

# Here is the breakdown of the balanced partitions as a fraction of the total
$Directories |
    Group-Object -Property BalancedPartition |
    ForEach-Object {
        ($_.Group | Measure-Object -Property Size -Sum | Select-Object -ExpandProperty Sum) / $TotalSize 
    }

这更有可能为您提供更平等的平衡,尽管它肯定不完美。

Okay, so I did it both ways. You didn't post anything of your own, so you'll have to adapt what I have here.

First, create a list of your directories. The two Partition fields here are just going to be which group they belong to. You'll want the sequential partition based on your comment.

# Set the number of partitions you need to divide into
$PartitonCount = 4

# I'm making my own list.
# DirectoryName is the FullName of the directory, while Size is the total size of the folder
$Directories = @'
DirectoryName,Size,SequentialPartition,BalancedPartition
Dir001,667117278790,,
Dir002,292429698039,,
Dir003,886665781748,,
Dir004,49665832174,,
Dir005,34041573768,,
Dir006,320236552339,,
Dir007,747674470078,,
Dir008,375284137393,,
Dir009,549754879999,,
Dir010,327528841615,,
Dir011,1079085662940,,
Dir012,1051279115201,,
Dir013,198772106622,,
Dir014,124437323951,,
Dir015,342261556929,,
Dir016,844330660560,,
Dir017,888294129196,,
Dir018,774656795794,,
Dir019,360019686543,,
Dir020,412884330229,,
'@ | ConvertFrom-Csv

Now, for the sequential partitioning:

# Determine the total size of all directories
$TotalSize = $Directories | Measure-Object -Property Size -Sum | Select-Object -ExpandProperty Sum

# This is the threshold for sequential balancing
$PartitionSizeThreshold = $TotalSize / $PartitonCount

# Initialize the partition size and partition ID
[int64]$CurrentPartitionSize = 0
$CurrentPartition = 0
foreach ($D in $Directories) {
    # Assign the file to the current partition
    $D.SequentialPartition = $CurrentPartition

    # Add the current file's size to the current partition's size
    $CurrentPartitionSize += $D.Size

    # If the current partition's size is over the threshold, go to the next empty partition
    if ($CurrentPartitionSize -gt $PartitionSizeThreshold) {
        $CurrentPartition++
        $CurrentPartitionSize = 0
    }
}

# Results
$Directories

You can see how well balanced it is with this:

# Here is the breakdown of the sequential partitions as a fraction of the total
$Directories |
    Group-Object -Property SequentialPartition |
    ForEach-Object {
         ($_.Group | Measure-Object -Property Size -Sum | Select-Object -ExpandProperty Sum) / $TotalSize 
    }

The last partition is likely to be pretty poorly balanced.


On the other hand, if it needs to be as balanced as possible:

# Create the partiton size array with a zero for each partition size
$BalancedPartitionSizes = 1..$PartitonCount | ForEach-Object { 0 }

# Sort largest to smallest to assign the largest directories first
$Directories | Sort-Object -Property Size -Descending |
    ForEach-Object {
        # Determine which index of the array has is the smallest size
        $SmallestPartition = $BalancedPartitionSizes | Sort-Object | Select-Object -First 1 | ForEach-Object { [Array]::IndexOf($BalancedPartitionSizes, $_) }

        # Add the directory to the smallest partition
        $BalancedPartitionSizes[$SmallestPartition] += $_.Size
        $_.BalancedPartition = $SmallestPartition
    }

# Results
$Directories | Sort-Object -Property BalancedPartition

And you can see how efficiently it balanced things:

# Here is the breakdown of the balanced partitions as a fraction of the total
$Directories |
    Group-Object -Property BalancedPartition |
    ForEach-Object {
        ($_.Group | Measure-Object -Property Size -Sum | Select-Object -ExpandProperty Sum) / $TotalSize 
    }

This is much more likely to give you a more equal balancing, though it's certainly not perfect.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文