我正在解析数百万个文件的Robocopy日志,如何使代码运行速度更快?
新手stackoverflow,我会尽力正确发布:)
希望有人可以帮助我更快地运行代码。
该代码是针对从大规模DFS服务器迁移(20个DFS服务器迁移)的Robocopy迁移日志运行的。
该代码首先捕获所讨论的日志的源/目标,然后查找“较新”,“旧”,“新文件”和“额外文件”条目/行。然后检查一下这些文件是否存在于两侧,它们具有什么属性,并对双方进行DFSR哈希检查(因为现在通过DFSR复制文件)。
主要问题是,哈希是否匹配了源和目的地以及是否存在临时属性。
我遇到的问题是,这些类型中有数百万个文件(迁移是gargantuan),因此脚本需要永远运行。为此,客户端将不允许端口进行psremoting/instoke-command。
目前,我正在运行我的代码,而无需多线程,每个DFS服务器上的副本都在查看其各自的日志,但仍然很慢。
我一直在考虑在每个日志行中循环循环(不是日志文件的循环),但是:
- 在每个日志/循环中,我的理解中有很多数据是我必须写出的,而不是将其保留在pscustomobject?否则我会用完RAM吗?
- 我真的不明白如何使用静音来获取CSV的多个写作。
有人可以在上述2分方面建议我吗?也许给我更多关于我可以做些什么来优化事物的想法?
我的完整代码在下面..
#Get Start Time
$ReportStartTime = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
If(!(test-path "C:\Temp\MasterReport_$ReportStartTime\")){
new-item -type directory -path "C:\Temp\MasterReport_$ReportStartTime\" | Out-Null
}
"Script Started:$ReportStartTime" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
#Get Logs from folder (Recursive)
$Logs = Try{
Get-ChildItem -path 'C:\Temp\RoboCopyLogs\*\*.log' -Recurse -ErrorAction Stop | Select FullName
}
catch{
$_.Exception >> "C:\Temp\MasterReport_$ReportStartTime\Errors_$ReportStartTime.log"
}
#Initialise Log Counters
$NumberOfFiles = 0
$DesktopFile = 0
$ProcessedFiles = 0
$Totalsize = 0
#Count Logs
$Logs | foreach {
$SourceLog = $_
#Get Logfile
$Log = Get-Content $SourceLog.FullName
#Get Log rows for required Error Types and begin loop
$Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' `
|foreach {
$NumberOfFiles=$NumberOfFiles+1
If($_ | Select-String -pattern 'Desktop.ini' -SimpleMatch){
$DesktopFile=$DesktopFile+1
}
}
}
$Expected = $NumberOfFiles - $DesktopFile
"Total Files To Check = $NumberOfFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files Excluded = $DesktopFile" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files To Ingest = $Expected" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
$Main = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Main Script:$Main" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
$Logs | foreach {
$SourceLog = $_
#Get Logfile
$Log = Get-Content $SourceLog.FullName
#Collect Source and Destination
$S = $Log | Select-String -Pattern 'Source :'
$D = $Log | Select-String -Pattern 'Dest :'
$SourceLocation = $S -replace '\s+Source : ',''
$DestLocation = $D -replace '\s+Dest : ',''
#Get Log rows for required Error Types and begin loop
$Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' | Select-String -pattern 'Desktop.ini' -SimpleMatch -NotMatch `
|foreach {
#This loop could be a foreach -parallel???
#Check Percent Completed
If($ProcessedFiles>0){
$PercentComplete=[Math]::Ceiling(($ProcessedFiles/$Expected)*100)
If($PercentComplete -match ('([0-9]0)')){
"$($PercentComplete)% Completed" > "C:\Temp\MasterReport_$ReportStartTime\PercentComplete.Log"
($ProcessedFiles/$Expected)*100
}
}
#Count Logs Processed
$ProcessedFiles=$ProcessedFiles+1
#Populate FilePath
$FilePath = $_ -Replace '.*(?=\\\\)', ''
#Populate Error type
$RoboErrorRaw = $_ -replace '\s+','|'
$RoboError = $RoboErrorRaw.split("|")[1]
#Check if file path relates to Source or the Destination and set path variables
if($FilePath -like "$SourceLocation*"){
$SourceFilePath = $FilePath
$DestFilePath = $FilePath.replace($SourceLocation,$DestLocation)
}
Elseif($FilePath -like "$DestLocation*"){
$DestFilePath = $FilePath
$SourceFilePath = $FilePath.replace($DestLocation,$SourceLocation)
$IsAtPartner = Test-Path $SourceFilePath
}
Else{
$DestFilepath = "Could Not Resolve UNC to Source or Destination"
}
#Check if file exists at source and destination
Try{
$IsAtPartner = Test-Path $DestFilePath -ErrorAction Stop
}
catch{
$IsAtPartner = $_.Exception
}
Try{
$IsAtSource = Test-path $SourceFilePath -ErrorAction Stop
}
catch{
$IsAtSource = $_.Exception
}
If($IsAtSource){
#Get the file details
Try{
$SourceFileDetails = Get-ChildItem $FilePath -Hidden -ErrorAction Stop
}
catch{
$SourceFileDetails = 'Failed'
}
if($SourceFileDetails -ne 'Failed'){
#Check has temp attribute
if((($SourceFileDetails).Attributes -band 0x100) -eq 0x100){
$TempAttribute = "Yes"
}
Else{
$TempAttribute = "No"
}
#Get attributes and last modified
Try{
$AllAttributes = ($SourceFileDetails).Attributes
}
catch{
$AllAttributes = $_.Exception
}
Try{
$Modified = ($SourceFileDetails).LastWriteTime.ToString()
}
catch{
$Modified = $_.Exception
}
}
}
#Check if .bak file
if($filePath -match '\.bak$'){
$Bakfile = "Yes"
}
Else{
$Bakfile = "No"
}
#Get Hashes
If($IsAtPartner -and $IsAtSource){
$HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
$HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
}
ElseIf(!$IsAtSource -and !$IsAtPartner){
$HashSource = 'File Does not Exist at Source'
$HashDest = 'File Does not Exist At Partner'
}
ElseIf(!$IsAtPartner){
$HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
$HashDest = 'File Does not Exist At Partner'
}
ElseIf(!$IsAtSource){
$HashSource = 'File Does not Exist at Source'
$HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
}
Else{
$HashSource = 'ERROR'
$HashDest = 'ERROR'
}
#Compare Valid Hashes
If($HashSource -eq $HashDest){
$HashMatch = 'Yes'
}
Else{
$HashMatch = 'No'
}
#Check Filesize where hashes do not match
If($HashMatch = 'No'){
$FileSizeMB = ($SourceFileDetails).length/1MB
}
#Create output object
$Obj = [PSCustomObject]@{
ErrorType = $RoboError
FilePath = $SourceFilePath
PartnerUNC = $DestFilePath
IsAtSource = $IsAtSource
IsAtDestination = $IsAtPartner
BakFile = $Bakfile
TepmpAttribute = $TempAttribute
LastModified = $Modified
AllAttributes = $AllAttributes
HashSource = $HashSource.FileHash
HashDest = $HashDest.FileHash
HashMatch = $HashMatch
RoboSource = $SourceLocation
RoboDest = $DestLocation
FileSizeMB = $FileSizeMB
SourceLog = $SourceLog.FullName
}
$Source = $SourceLocation.split('\\')[2]
$Destination = $DestLocation.split('\\')[2]
if(!(test-path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)")){
new-item -type directory -path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)" | Out-Null
}
#export to csv
$obj | Export-Csv -Path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
$obj | Export-Csv -Path "C:\Temp\MasterReport_$ReportStartTime\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
#Increment total size of data
If($HashMatch -eq "Yes"){
$Totalsize = $Totalsize + $SourceFileDetails.Length
}
clear-variable -name RoboError,SourceFilePath,DestFilePath,IsAtSource,IsAtPartner,Bakfile,TempAttribute,Modified,AllAttributes,HashSource,HashDest,HashMatch,FileSizeMB,Source,Destination
if($SourceFileDetails){
Remove-Variable -name SourceFileDetails
}
}
}
$Completion = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Script Completed:$Completion Excluded Processed = $DesktopFile ,Total Processed = $ProcessedFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Files without Matching Hashses amount to $($Totalsize/1GB)GB" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
这是一些示例日志数据(可以放入C:\ temp \ robocopylogs \ logs \ logs \以运行以上代码)
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : 24 April 2022 17:29:57
Source : \\Test01\
Dest : \\Test02\
Files : *.*
Exc Files : ~*.*
*.TMP
Exc Dirs : \\Test01\DfsrPrivate
Options : *.* /FFT /TS /L /S /E /DCOPY:DA /COPY:DAT /PURGE /MIR /B /NP /XJD /MT:8 /R:0 /W:0
------------------------------------------------------------------------------
Newer 30720 2021/07/20 14:49:36 \\Test01\Test2121.xls
Older 651776 2020/10/25 21:49:32 \\Test01\testppt.ppt
Older 94720 2019/06/10 11:46:03 \\Test01\Thumbs.db
*EXTRA File 1.7 m 2020/09/17 10:36:57 \\Test02\months.jpg
*EXTRA File 1.8 m 2020/09/17 10:36:57 \\Test02\happy.jpg
New File 6421 2020/10/26 10:32:43 \\Test01\26-10-20.pdf
New File 6321 2020/10/26 10:32:43 \\Test01\Testing20.pdf
New to StackOverflow, I'll do my best to post correctly :)
Hoping someone can help me to get my code running faster.
The code is run against RoboCopy Migration logs from a massive DFS server migration (20 DFS servers being migrated).
The code first captures the source/destination of the log in question and then looks for the 'Newer', 'Older', 'New File' and 'Extra File' entries/rows. It then checks to see if these files exist at each side, what attributes they have and does a DFSR hash check against both sides (as the files are now being replicated via DFSR).
The main concern is if the hashes match for source and destination and if the temporary attribute is in place.
The problem I am having is that there are millions of files logged under these types (the migration was gargantuan) so the script is taking forever to run. To add to this the client will not allow ports for psremoting/invoke-command.
At present I am running my code without multi-threading, with a copy on each of the DFS servers looking at their respective logs but it is still slow.
I have been looking at running a foreach parallel on looping through each log row (not the loop of log files) but:
- With so much data within each log/loop my understanding is that I have to write it out rather than keep it in an PsCustomObject? Otherwise I would run out of RAM?
- I don't really understand how to use MUTEXes to get multiple writes to the CSV.
Can someone please advise me on the above 2 points? And maybe give me some more ideas on what I can do to optimise things?
My full code is below..
#Get Start Time
$ReportStartTime = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
If(!(test-path "C:\Temp\MasterReport_$ReportStartTime\")){
new-item -type directory -path "C:\Temp\MasterReport_$ReportStartTime\" | Out-Null
}
"Script Started:$ReportStartTime" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
#Get Logs from folder (Recursive)
$Logs = Try{
Get-ChildItem -path 'C:\Temp\RoboCopyLogs\*\*.log' -Recurse -ErrorAction Stop | Select FullName
}
catch{
$_.Exception >> "C:\Temp\MasterReport_$ReportStartTime\Errors_$ReportStartTime.log"
}
#Initialise Log Counters
$NumberOfFiles = 0
$DesktopFile = 0
$ProcessedFiles = 0
$Totalsize = 0
#Count Logs
$Logs | foreach {
$SourceLog = $_
#Get Logfile
$Log = Get-Content $SourceLog.FullName
#Get Log rows for required Error Types and begin loop
$Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' `
|foreach {
$NumberOfFiles=$NumberOfFiles+1
If($_ | Select-String -pattern 'Desktop.ini' -SimpleMatch){
$DesktopFile=$DesktopFile+1
}
}
}
$Expected = $NumberOfFiles - $DesktopFile
"Total Files To Check = $NumberOfFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files Excluded = $DesktopFile" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files To Ingest = $Expected" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
$Main = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Main Script:$Main" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
$Logs | foreach {
$SourceLog = $_
#Get Logfile
$Log = Get-Content $SourceLog.FullName
#Collect Source and Destination
$S = $Log | Select-String -Pattern 'Source :'
$D = $Log | Select-String -Pattern 'Dest :'
$SourceLocation = $S -replace '\s+Source : ',''
$DestLocation = $D -replace '\s+Dest : ',''
#Get Log rows for required Error Types and begin loop
$Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' | Select-String -pattern 'Desktop.ini' -SimpleMatch -NotMatch `
|foreach {
#This loop could be a foreach -parallel???
#Check Percent Completed
If($ProcessedFiles>0){
$PercentComplete=[Math]::Ceiling(($ProcessedFiles/$Expected)*100)
If($PercentComplete -match ('([0-9]0)')){
"$($PercentComplete)% Completed" > "C:\Temp\MasterReport_$ReportStartTime\PercentComplete.Log"
($ProcessedFiles/$Expected)*100
}
}
#Count Logs Processed
$ProcessedFiles=$ProcessedFiles+1
#Populate FilePath
$FilePath = $_ -Replace '.*(?=\\\\)', ''
#Populate Error type
$RoboErrorRaw = $_ -replace '\s+','|'
$RoboError = $RoboErrorRaw.split("|")[1]
#Check if file path relates to Source or the Destination and set path variables
if($FilePath -like "$SourceLocation*"){
$SourceFilePath = $FilePath
$DestFilePath = $FilePath.replace($SourceLocation,$DestLocation)
}
Elseif($FilePath -like "$DestLocation*"){
$DestFilePath = $FilePath
$SourceFilePath = $FilePath.replace($DestLocation,$SourceLocation)
$IsAtPartner = Test-Path $SourceFilePath
}
Else{
$DestFilepath = "Could Not Resolve UNC to Source or Destination"
}
#Check if file exists at source and destination
Try{
$IsAtPartner = Test-Path $DestFilePath -ErrorAction Stop
}
catch{
$IsAtPartner = $_.Exception
}
Try{
$IsAtSource = Test-path $SourceFilePath -ErrorAction Stop
}
catch{
$IsAtSource = $_.Exception
}
If($IsAtSource){
#Get the file details
Try{
$SourceFileDetails = Get-ChildItem $FilePath -Hidden -ErrorAction Stop
}
catch{
$SourceFileDetails = 'Failed'
}
if($SourceFileDetails -ne 'Failed'){
#Check has temp attribute
if((($SourceFileDetails).Attributes -band 0x100) -eq 0x100){
$TempAttribute = "Yes"
}
Else{
$TempAttribute = "No"
}
#Get attributes and last modified
Try{
$AllAttributes = ($SourceFileDetails).Attributes
}
catch{
$AllAttributes = $_.Exception
}
Try{
$Modified = ($SourceFileDetails).LastWriteTime.ToString()
}
catch{
$Modified = $_.Exception
}
}
}
#Check if .bak file
if($filePath -match '\.bak
Here is some example log data (could be put in C:\Temp\RoboCopyLogs\Logs\ to run with above code)
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : 24 April 2022 17:29:57
Source : \\Test01\
Dest : \\Test02\
Files : *.*
Exc Files : ~*.*
*.TMP
Exc Dirs : \\Test01\DfsrPrivate
Options : *.* /FFT /TS /L /S /E /DCOPY:DA /COPY:DAT /PURGE /MIR /B /NP /XJD /MT:8 /R:0 /W:0
------------------------------------------------------------------------------
Newer 30720 2021/07/20 14:49:36 \\Test01\Test2121.xls
Older 651776 2020/10/25 21:49:32 \\Test01\testppt.ppt
Older 94720 2019/06/10 11:46:03 \\Test01\Thumbs.db
*EXTRA File 1.7 m 2020/09/17 10:36:57 \\Test02\months.jpg
*EXTRA File 1.8 m 2020/09/17 10:36:57 \\Test02\happy.jpg
New File 6421 2020/10/26 10:32:43 \\Test01\26-10-20.pdf
New File 6321 2020/10/26 10:32:43 \\Test01\Testing20.pdf
){
$Bakfile = "Yes"
}
Else{
$Bakfile = "No"
}
#Get Hashes
If($IsAtPartner -and $IsAtSource){
$HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
$HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
}
ElseIf(!$IsAtSource -and !$IsAtPartner){
$HashSource = 'File Does not Exist at Source'
$HashDest = 'File Does not Exist At Partner'
}
ElseIf(!$IsAtPartner){
$HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
$HashDest = 'File Does not Exist At Partner'
}
ElseIf(!$IsAtSource){
$HashSource = 'File Does not Exist at Source'
$HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
}
Else{
$HashSource = 'ERROR'
$HashDest = 'ERROR'
}
#Compare Valid Hashes
If($HashSource -eq $HashDest){
$HashMatch = 'Yes'
}
Else{
$HashMatch = 'No'
}
#Check Filesize where hashes do not match
If($HashMatch = 'No'){
$FileSizeMB = ($SourceFileDetails).length/1MB
}
#Create output object
$Obj = [PSCustomObject]@{
ErrorType = $RoboError
FilePath = $SourceFilePath
PartnerUNC = $DestFilePath
IsAtSource = $IsAtSource
IsAtDestination = $IsAtPartner
BakFile = $Bakfile
TepmpAttribute = $TempAttribute
LastModified = $Modified
AllAttributes = $AllAttributes
HashSource = $HashSource.FileHash
HashDest = $HashDest.FileHash
HashMatch = $HashMatch
RoboSource = $SourceLocation
RoboDest = $DestLocation
FileSizeMB = $FileSizeMB
SourceLog = $SourceLog.FullName
}
$Source = $SourceLocation.split('\\')[2]
$Destination = $DestLocation.split('\\')[2]
if(!(test-path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)")){
new-item -type directory -path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)" | Out-Null
}
#export to csv
$obj | Export-Csv -Path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
$obj | Export-Csv -Path "C:\Temp\MasterReport_$ReportStartTime\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
#Increment total size of data
If($HashMatch -eq "Yes"){
$Totalsize = $Totalsize + $SourceFileDetails.Length
}
clear-variable -name RoboError,SourceFilePath,DestFilePath,IsAtSource,IsAtPartner,Bakfile,TempAttribute,Modified,AllAttributes,HashSource,HashDest,HashMatch,FileSizeMB,Source,Destination
if($SourceFileDetails){
Remove-Variable -name SourceFileDetails
}
}
}
$Completion = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Script Completed:$Completion Excluded Processed = $DesktopFile ,Total Processed = $ProcessedFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Files without Matching Hashses amount to $($Totalsize/1GB)GB" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
Here is some example log data (could be put in C:\Temp\RoboCopyLogs\Logs\ to run with above code)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论