我正在解析数百万个文件的Robocopy日志,如何使代码运行速度更快?

发布于 2025-01-27 16:43:54 字数 13144 浏览 3 评论 0原文

新手stackoverflow,我会尽力正确发布:)

希望有人可以帮助我更快地运行代码。

该代码是针对从大规模DFS服务器迁移(20个DFS服务器迁移)的Robocopy迁移日志运行的。

该代码首先捕获所讨论的日志的源/目标,然后查找“较新”,“旧”,“新文件”和“额外文件”条目/行。然后检查一下这些文件是否存在于两侧,它们具有什么属性,并对双方进行DFSR哈希检查(因为现在通过DFSR复制文件)。

主要问题是,哈希是否匹配了源和目的地以及是否存在临时属性。

我遇到的问题是,这些类型中有数百万个文件(迁移是gargantuan),因此脚本需要永远运行。为此,客户端将不允许端口进行psremoting/instoke-command。

目前,我正在运行我的代码,而无需多线程,每个DFS服务器上的副本都在查看其各自的日志,但仍然很慢。

我一直在考虑在每个日志行中循环循环(不是日志文件的循环),但是:

  1. 在每个日志/循环中,我的理解中有很多数据是我必须写出的,而不是将其保留在pscustomobject?否则我会用完RAM吗?
  2. 我真的不明白如何使用静音来获取CSV的多个写作。

有人可以在上述2分方面建议我吗?也许给我更多关于我可以做些什么来优化事物的想法?

我的完整代码在下面..

#Get Start Time
$ReportStartTime = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')

If(!(test-path "C:\Temp\MasterReport_$ReportStartTime\")){
                    new-item -type directory -path "C:\Temp\MasterReport_$ReportStartTime\" | Out-Null
                    }

"Script Started:$ReportStartTime" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

#Get Logs from folder (Recursive)
$Logs = Try{
            Get-ChildItem -path 'C:\Temp\RoboCopyLogs\*\*.log' -Recurse -ErrorAction Stop | Select FullName 
            }
            catch{
                $_.Exception >> "C:\Temp\MasterReport_$ReportStartTime\Errors_$ReportStartTime.log"
                }
            
#Initialise Log Counters
$NumberOfFiles = 0
$DesktopFile = 0
$ProcessedFiles = 0
$Totalsize = 0

#Count Logs
$Logs | foreach {

        $SourceLog = $_
        #Get Logfile
        $Log = Get-Content $SourceLog.FullName


        #Get Log rows for required Error Types and begin loop
         $Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' `
        |foreach {    
                $NumberOfFiles=$NumberOfFiles+1

                If($_ | Select-String -pattern 'Desktop.ini' -SimpleMatch){
                    $DesktopFile=$DesktopFile+1  
                }           
        }
}

$Expected = $NumberOfFiles - $DesktopFile

"Total Files To Check  = $NumberOfFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files Excluded  = $DesktopFile" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files To Ingest = $Expected" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

$Main = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Main Script:$Main" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"


$Logs | foreach {

        $SourceLog = $_
        #Get Logfile
        $Log = Get-Content $SourceLog.FullName

        #Collect Source and Destination
        $S = $Log | Select-String -Pattern 'Source :'
        $D = $Log | Select-String -Pattern 'Dest :' 

        $SourceLocation = $S -replace '\s+Source : ',''
        $DestLocation = $D -replace '\s+Dest : ',''


        #Get Log rows for required Error Types and begin loop
        $Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' | Select-String -pattern 'Desktop.ini' -SimpleMatch -NotMatch `
        |foreach {
#This loop could be a foreach -parallel???

                #Check Percent Completed                
                If($ProcessedFiles>0){
                $PercentComplete=[Math]::Ceiling(($ProcessedFiles/$Expected)*100)
               
                    If($PercentComplete -match ('([0-9]0)')){
                        "$($PercentComplete)% Completed" > "C:\Temp\MasterReport_$ReportStartTime\PercentComplete.Log" 
                        ($ProcessedFiles/$Expected)*100               
                    }
                }

                #Count Logs Processed
                $ProcessedFiles=$ProcessedFiles+1  
                
                #Populate FilePath
                $FilePath = $_ -Replace '.*(?=\\\\)', ''

                #Populate Error type
                $RoboErrorRaw = $_ -replace '\s+','|'
                $RoboError = $RoboErrorRaw.split("|")[1]

                #Check if file path relates to Source or the Destination and set path variables
                if($FilePath -like "$SourceLocation*"){

                    $SourceFilePath = $FilePath
                    $DestFilePath = $FilePath.replace($SourceLocation,$DestLocation)
                    
                    }
                    Elseif($FilePath -like "$DestLocation*"){
                        $DestFilePath = $FilePath
                        $SourceFilePath = $FilePath.replace($DestLocation,$SourceLocation)
                        $IsAtPartner = Test-Path $SourceFilePath
                        }
                        Else{
                            $DestFilepath = "Could Not Resolve UNC to Source or Destination"
                        }
                   
                #Check if file exists at source and destination
                Try{
                    $IsAtPartner = Test-Path $DestFilePath -ErrorAction Stop
                    }
                        catch{
                        $IsAtPartner = $_.Exception
                        }

                Try{
                    $IsAtSource = Test-path $SourceFilePath -ErrorAction Stop
                    }
                        catch{
                        $IsAtSource = $_.Exception
                        }
                    
                
                If($IsAtSource){   
                        #Get the file details
                        Try{
                        $SourceFileDetails = Get-ChildItem $FilePath -Hidden -ErrorAction Stop
                        }
                        catch{ 
                            $SourceFileDetails = 'Failed'   
                            }

                        if($SourceFileDetails -ne 'Failed'){
                            #Check has temp attribute
                            if((($SourceFileDetails).Attributes -band 0x100) -eq 0x100){
                                $TempAttribute = "Yes"
                                }
                                Else{
                                    $TempAttribute = "No"
                                    } 
                                #Get attributes and last modified
                                Try{
                                        $AllAttributes = ($SourceFileDetails).Attributes
                                    }
                                    catch{
                                        $AllAttributes = $_.Exception
                                    }
                    
                                Try{
                                    $Modified = ($SourceFileDetails).LastWriteTime.ToString()  
                                    }
                                    catch{
                                        $Modified = $_.Exception
                                    } 
                        }
                     }
  
                #Check if .bak file
                if($filePath -match '\.bak$'){
                $Bakfile = "Yes"  
                }
                    Else{
                        $Bakfile = "No"
                    }
       
                #Get Hashes
                If($IsAtPartner -and $IsAtSource){
                       $HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
                       $HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
                    }
                      ElseIf(!$IsAtSource -and !$IsAtPartner){
                                $HashSource = 'File Does not Exist at Source'
                                $HashDest = 'File Does not Exist At Partner'
                                }  
                                ElseIf(!$IsAtPartner){
                                        $HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
                                        $HashDest = 'File Does not Exist At Partner'
                                    }
                                    ElseIf(!$IsAtSource){
                                            $HashSource = 'File Does not Exist at Source'
                                            $HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
                                            }
                                            Else{
                                                    $HashSource = 'ERROR'
                                                    $HashDest = 'ERROR'
                                                }

                #Compare Valid Hashes
                If($HashSource -eq $HashDest){
                    $HashMatch = 'Yes'                    
                    }
                    Else{
                        $HashMatch = 'No'
                    }
                       
                #Check Filesize where hashes do not match
                If($HashMatch = 'No'){
                $FileSizeMB = ($SourceFileDetails).length/1MB
                }

                #Create output object
                $Obj = [PSCustomObject]@{
                                        ErrorType = $RoboError
                                        FilePath = $SourceFilePath
                                        PartnerUNC = $DestFilePath
                                        IsAtSource = $IsAtSource
                                        IsAtDestination = $IsAtPartner
                                        BakFile = $Bakfile 
                                        TepmpAttribute = $TempAttribute
                                        LastModified = $Modified
                                        AllAttributes = $AllAttributes                                                
                                        HashSource = $HashSource.FileHash
                                        HashDest = $HashDest.FileHash
                                        HashMatch = $HashMatch                                        
                                        RoboSource = $SourceLocation
                                        RoboDest = $DestLocation
                                        FileSizeMB = $FileSizeMB
                                        SourceLog = $SourceLog.FullName
                                        } 

                $Source = $SourceLocation.split('\\')[2]
                $Destination = $DestLocation.split('\\')[2]

                if(!(test-path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)")){
                    new-item -type directory -path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)"  | Out-Null
                    }              

                #export to csv
                $obj |  Export-Csv -Path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
                $obj |  Export-Csv -Path "C:\Temp\MasterReport_$ReportStartTime\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
                
                #Increment total size of data
                If($HashMatch -eq "Yes"){
                    $Totalsize = $Totalsize + $SourceFileDetails.Length
                }
                
                clear-variable -name RoboError,SourceFilePath,DestFilePath,IsAtSource,IsAtPartner,Bakfile,TempAttribute,Modified,AllAttributes,HashSource,HashDest,HashMatch,FileSizeMB,Source,Destination
                
                if($SourceFileDetails){
                    Remove-Variable -name SourceFileDetails
                }                
            }
}
$Completion = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Script Completed:$Completion Excluded Processed = $DesktopFile ,Total Processed = $ProcessedFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Files without Matching Hashses amount to $($Totalsize/1GB)GB" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

这是一些示例日志数据(可以放入C:\ temp \ robocopylogs \ logs \ logs \以运行以上代码)

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows                              
-------------------------------------------------------------------------------

  Started : 24 April 2022 17:29:57
   Source : \\Test01\
     Dest : \\Test02\

    Files : *.*
        
Exc Files : ~*.*
        *.TMP
        
 Exc Dirs : \\Test01\DfsrPrivate
        
  Options : *.* /FFT /TS /L /S /E /DCOPY:DA /COPY:DAT /PURGE /MIR /B /NP /XJD /MT:8 /R:0 /W:0 

------------------------------------------------------------------------------

        Newer              30720 2021/07/20 14:49:36    \\Test01\Test2121.xls
        Older             651776 2020/10/25 21:49:32    \\Test01\testppt.ppt
        Older              94720 2019/06/10 11:46:03    \\Test01\Thumbs.db
      *EXTRA File          1.7 m 2020/09/17 10:36:57    \\Test02\months.jpg
      *EXTRA File          1.8 m 2020/09/17 10:36:57    \\Test02\happy.jpg
        New File            6421 2020/10/26 10:32:43    \\Test01\26-10-20.pdf
        New File            6321 2020/10/26 10:32:43    \\Test01\Testing20.pdf

New to StackOverflow, I'll do my best to post correctly :)

Hoping someone can help me to get my code running faster.

The code is run against RoboCopy Migration logs from a massive DFS server migration (20 DFS servers being migrated).

The code first captures the source/destination of the log in question and then looks for the 'Newer', 'Older', 'New File' and 'Extra File' entries/rows. It then checks to see if these files exist at each side, what attributes they have and does a DFSR hash check against both sides (as the files are now being replicated via DFSR).

The main concern is if the hashes match for source and destination and if the temporary attribute is in place.

The problem I am having is that there are millions of files logged under these types (the migration was gargantuan) so the script is taking forever to run. To add to this the client will not allow ports for psremoting/invoke-command.

At present I am running my code without multi-threading, with a copy on each of the DFS servers looking at their respective logs but it is still slow.

I have been looking at running a foreach parallel on looping through each log row (not the loop of log files) but:

  1. With so much data within each log/loop my understanding is that I have to write it out rather than keep it in an PsCustomObject? Otherwise I would run out of RAM?
  2. I don't really understand how to use MUTEXes to get multiple writes to the CSV.

Can someone please advise me on the above 2 points? And maybe give me some more ideas on what I can do to optimise things?

My full code is below..

#Get Start Time
$ReportStartTime = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')

If(!(test-path "C:\Temp\MasterReport_$ReportStartTime\")){
                    new-item -type directory -path "C:\Temp\MasterReport_$ReportStartTime\" | Out-Null
                    }

"Script Started:$ReportStartTime" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

#Get Logs from folder (Recursive)
$Logs = Try{
            Get-ChildItem -path 'C:\Temp\RoboCopyLogs\*\*.log' -Recurse -ErrorAction Stop | Select FullName 
            }
            catch{
                $_.Exception >> "C:\Temp\MasterReport_$ReportStartTime\Errors_$ReportStartTime.log"
                }
            
#Initialise Log Counters
$NumberOfFiles = 0
$DesktopFile = 0
$ProcessedFiles = 0
$Totalsize = 0

#Count Logs
$Logs | foreach {

        $SourceLog = $_
        #Get Logfile
        $Log = Get-Content $SourceLog.FullName


        #Get Log rows for required Error Types and begin loop
         $Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' `
        |foreach {    
                $NumberOfFiles=$NumberOfFiles+1

                If($_ | Select-String -pattern 'Desktop.ini' -SimpleMatch){
                    $DesktopFile=$DesktopFile+1  
                }           
        }
}

$Expected = $NumberOfFiles - $DesktopFile

"Total Files To Check  = $NumberOfFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files Excluded  = $DesktopFile" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files To Ingest = $Expected" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

$Main = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Main Script:$Main" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"


$Logs | foreach {

        $SourceLog = $_
        #Get Logfile
        $Log = Get-Content $SourceLog.FullName

        #Collect Source and Destination
        $S = $Log | Select-String -Pattern 'Source :'
        $D = $Log | Select-String -Pattern 'Dest :' 

        $SourceLocation = $S -replace '\s+Source : ',''
        $DestLocation = $D -replace '\s+Dest : ',''


        #Get Log rows for required Error Types and begin loop
        $Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' | Select-String -pattern 'Desktop.ini' -SimpleMatch -NotMatch `
        |foreach {
#This loop could be a foreach -parallel???

                #Check Percent Completed                
                If($ProcessedFiles>0){
                $PercentComplete=[Math]::Ceiling(($ProcessedFiles/$Expected)*100)
               
                    If($PercentComplete -match ('([0-9]0)')){
                        "$($PercentComplete)% Completed" > "C:\Temp\MasterReport_$ReportStartTime\PercentComplete.Log" 
                        ($ProcessedFiles/$Expected)*100               
                    }
                }

                #Count Logs Processed
                $ProcessedFiles=$ProcessedFiles+1  
                
                #Populate FilePath
                $FilePath = $_ -Replace '.*(?=\\\\)', ''

                #Populate Error type
                $RoboErrorRaw = $_ -replace '\s+','|'
                $RoboError = $RoboErrorRaw.split("|")[1]

                #Check if file path relates to Source or the Destination and set path variables
                if($FilePath -like "$SourceLocation*"){

                    $SourceFilePath = $FilePath
                    $DestFilePath = $FilePath.replace($SourceLocation,$DestLocation)
                    
                    }
                    Elseif($FilePath -like "$DestLocation*"){
                        $DestFilePath = $FilePath
                        $SourceFilePath = $FilePath.replace($DestLocation,$SourceLocation)
                        $IsAtPartner = Test-Path $SourceFilePath
                        }
                        Else{
                            $DestFilepath = "Could Not Resolve UNC to Source or Destination"
                        }
                   
                #Check if file exists at source and destination
                Try{
                    $IsAtPartner = Test-Path $DestFilePath -ErrorAction Stop
                    }
                        catch{
                        $IsAtPartner = $_.Exception
                        }

                Try{
                    $IsAtSource = Test-path $SourceFilePath -ErrorAction Stop
                    }
                        catch{
                        $IsAtSource = $_.Exception
                        }
                    
                
                If($IsAtSource){   
                        #Get the file details
                        Try{
                        $SourceFileDetails = Get-ChildItem $FilePath -Hidden -ErrorAction Stop
                        }
                        catch{ 
                            $SourceFileDetails = 'Failed'   
                            }

                        if($SourceFileDetails -ne 'Failed'){
                            #Check has temp attribute
                            if((($SourceFileDetails).Attributes -band 0x100) -eq 0x100){
                                $TempAttribute = "Yes"
                                }
                                Else{
                                    $TempAttribute = "No"
                                    } 
                                #Get attributes and last modified
                                Try{
                                        $AllAttributes = ($SourceFileDetails).Attributes
                                    }
                                    catch{
                                        $AllAttributes = $_.Exception
                                    }
                    
                                Try{
                                    $Modified = ($SourceFileDetails).LastWriteTime.ToString()  
                                    }
                                    catch{
                                        $Modified = $_.Exception
                                    } 
                        }
                     }
  
                #Check if .bak file
                if($filePath -match '\.bak

Here is some example log data (could be put in C:\Temp\RoboCopyLogs\Logs\ to run with above code)

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows                              
-------------------------------------------------------------------------------

  Started : 24 April 2022 17:29:57
   Source : \\Test01\
     Dest : \\Test02\

    Files : *.*
        
Exc Files : ~*.*
        *.TMP
        
 Exc Dirs : \\Test01\DfsrPrivate
        
  Options : *.* /FFT /TS /L /S /E /DCOPY:DA /COPY:DAT /PURGE /MIR /B /NP /XJD /MT:8 /R:0 /W:0 

------------------------------------------------------------------------------

        Newer              30720 2021/07/20 14:49:36    \\Test01\Test2121.xls
        Older             651776 2020/10/25 21:49:32    \\Test01\testppt.ppt
        Older              94720 2019/06/10 11:46:03    \\Test01\Thumbs.db
      *EXTRA File          1.7 m 2020/09/17 10:36:57    \\Test02\months.jpg
      *EXTRA File          1.8 m 2020/09/17 10:36:57    \\Test02\happy.jpg
        New File            6421 2020/10/26 10:32:43    \\Test01\26-10-20.pdf
        New File            6321 2020/10/26 10:32:43    \\Test01\Testing20.pdf
){ $Bakfile = "Yes" } Else{ $Bakfile = "No" } #Get Hashes If($IsAtPartner -and $IsAtSource){ $HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash $HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash } ElseIf(!$IsAtSource -and !$IsAtPartner){ $HashSource = 'File Does not Exist at Source' $HashDest = 'File Does not Exist At Partner' } ElseIf(!$IsAtPartner){ $HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash $HashDest = 'File Does not Exist At Partner' } ElseIf(!$IsAtSource){ $HashSource = 'File Does not Exist at Source' $HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash } Else{ $HashSource = 'ERROR' $HashDest = 'ERROR' } #Compare Valid Hashes If($HashSource -eq $HashDest){ $HashMatch = 'Yes' } Else{ $HashMatch = 'No' } #Check Filesize where hashes do not match If($HashMatch = 'No'){ $FileSizeMB = ($SourceFileDetails).length/1MB } #Create output object $Obj = [PSCustomObject]@{ ErrorType = $RoboError FilePath = $SourceFilePath PartnerUNC = $DestFilePath IsAtSource = $IsAtSource IsAtDestination = $IsAtPartner BakFile = $Bakfile TepmpAttribute = $TempAttribute LastModified = $Modified AllAttributes = $AllAttributes HashSource = $HashSource.FileHash HashDest = $HashDest.FileHash HashMatch = $HashMatch RoboSource = $SourceLocation RoboDest = $DestLocation FileSizeMB = $FileSizeMB SourceLog = $SourceLog.FullName } $Source = $SourceLocation.split('\\')[2] $Destination = $DestLocation.split('\\')[2] if(!(test-path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)")){ new-item -type directory -path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)" | Out-Null } #export to csv $obj | Export-Csv -Path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append $obj | Export-Csv -Path "C:\Temp\MasterReport_$ReportStartTime\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append #Increment total size of data If($HashMatch -eq "Yes"){ $Totalsize = $Totalsize + $SourceFileDetails.Length } clear-variable -name RoboError,SourceFilePath,DestFilePath,IsAtSource,IsAtPartner,Bakfile,TempAttribute,Modified,AllAttributes,HashSource,HashDest,HashMatch,FileSizeMB,Source,Destination if($SourceFileDetails){ Remove-Variable -name SourceFileDetails } } } $Completion = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss') "Script Completed:$Completion Excluded Processed = $DesktopFile ,Total Processed = $ProcessedFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log" "Files without Matching Hashses amount to $($Totalsize/1GB)GB" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

Here is some example log data (could be put in C:\Temp\RoboCopyLogs\Logs\ to run with above code)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文