PowerShell缓慢处理大文件
我有一个PowerShell程序,该程序读取文件并在另一个表中查找帐户,然后更新并写出一个带有额外信息的新文件。它实际上将文件传递了两次。一次累积总计,然后再次构建输出文件。 (无论是表还是文件通过,我都无法对数据进行排序,因为必须按原始顺序保存数据,以使交易保持平衡。)
我尝试将文件加载到数组中并传递两次,我已经尝试了尝试在不构建数组的情况下两次读取文件。处理文件两次是更快的数量级,这并不计算加载文件需要多长时间,这只是我的循环时间。
当我用约12 MB的文件运行84,000行时,它运行得很快。整个过程大约需要5分钟(处理文件,而不是使用表)。
当我处理较大的文件时(1.2 GB和700万个记录。)花费更长的时间。具有表格估计的一个将需要2天以上。通过读取文件两次来运行它大约需要8个小时。
我知道700万张记录很多,但是8个小时在具有256 GB RAM的Windows Server上是非常长的时间,而且处理器比PowerShell甚至可以使用的多。 (即使是处理84,000个记录的5分钟也似乎是永恒的。)
当我建造桌子时,Powershell会消耗30 GB的RAM。看来,当Powershell具有大量的内存足迹时,性能会大幅下降。
我读了很多有关Powershell性能的东西。我不是在用++构建字符串,我不会把很多东西都交给其他事情。这确实是一个非常基本的,请将CSV读取到数组中,然后读取文件,并且对于文件中的每个记录,并且在数组中找到文件上的项目时,请执行一些简单的计算并继续进行。 (数组之所以没有排序,是因为我实际上在数组中的两个不同字段上搜索,并且两者都有重复的值,因此我每次都必须处理整个内容,但是,该数组中的记录少于400个记录。 )
因此,当我进行数学,处理700万个记录时,每个搜索400个记录听起来很多,但是在高端Windows Server上仍在8小时?这真的是Powershell的速度吗?
感谢所有阅读本文并提供建议或建议的人。 -todd
编辑:这是一些代码。已读取为内存的表加载了此命令:
$ICList = Import-Csv -Path $Path2 -encoding Default
它有31组存储桶,用于在一个月的每一天(2个总数和每天2个)中累积总计,
这是标题和示例记录
ICKey,ICAC,OffsetAC,OffsetBank,Type,Last,Total,offtotal,pct,Total01,offtotal01,pct01,Total02,offtotal02,pct02,Total03,offtotal03,pct03,Total04,offtotal04,pct04,Total05,offtotal05,pct05,Total06,offtotal06,pct06,Total07,offtotal07,pct07,Total08,offtotal08,pct08,Total09,offtotal09,pct09,Total10,offtotal10,pct10,Total11,offtotal11,pct11,Total12,offtotal12,pct12,Total13,offtotal13,pct13,Total14,offtotal14,pct14,Total15,offtotal15,pct15,Total16,offtotal16,pct16,Total17,offtotal17,pct17,Total18,offtotal18,pct18,Total19,offtotal19,pct19,Total20,offtotal20,pct20,Total21,offtotal21,pct21,Total22,offtotal22,pct22,Total23,offtotal23,pct23,Total24,offtotal24,pct24,Total25,offtotal25,pct25,Total26,offtotal26,pct26,Total27,offtotal27,pct27,Total28,offtotal28,pct28,Total29,offtotal29,pct29,Total30,offtotal30,pct30,Total31,offtotal31,pct31
123456XX0012,123456,987654,XX0012,12m,no,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
:在下一个循环中读取的文件看起来像这样:
001,Company_Reference_ID,XX0012,USD,ACTUALS,2020-12-31,Conversion,,,,00000003,,,000000005376026.81,,,,,,123456,00100217,,,
累积总数的循环如下所示:
$reader = [System.IO.File]::OpenText($path)
$line = $reader.ReadLine()
try {
while ($null -ne ($line = $reader.ReadLine())) {
$item = $line.Split(",")
$thisDate = $item[5].substring(8, 2) #AccountingDate
$ttot = 'Total' + $thisDate
$toff = 'Offtotal' + $thisDate
foreach ($item2 in $ICList) {
$itco = $item[2] #CompanyReferenceID
$itkey = $item[19] #ExternalCode_Account
if ($item2.Type -eq 'm2m' -and $itco -ne 'LE0001') { $itco = 'LE0004' }
if ($itkey -eq '109495') { $itkey = '109479' }
if ($itkey -eq '172574') { $itkey = '172697' }
if ($itkey -eq '172718') { $itkey = '172590' }
$thisrecord = $itkey + $itco
if ($thisrecord -eq $item2.ICKey) {
$item2.$ttot = [Double]$item2.$ttot + [Double]$item[14] - [Double]$item[13] # .Creditamount .Debitamount
}
if ($itkey -eq $item2.offsetac) {
$Item2.$toff = [Double]$Item2.$toff + [Double]$item[14] - [Double]$item[13] #.Creditamount .Debitamount
}
}
$count++
}
}
catch {
Write-Host $_.Exception.Message
}
finally {
$reader.Close()
}
上面的循环花费大量时间。
然后,我有一个小循环,可以通过$ ICLIST表进行旋转以计算百分比:
foreach ($item2 in $ICList) {
foreach ($day in $days) {
$ttot = 'Total' + '{0:d2}' -f $day
$toff = 'Offtotal' + '{0:d2}' -f $day
$tpct = 'Pct' + '{0:d2}' -f $day
$item2.$tpct = ([double]$Item2.$ttot) / ([double]$Item2.$toff)
}
}
然后我再次阅读该文件,并用额外的记录将其写出,以将单个金额分配给多个公司,并将新专栏与公司添加到最后。实际上,该部分的运行速度比第一个循环快。 (尽管它正在做更多的处理,
$first = 1
$count = 1
$reader = [System.IO.File]::OpenText($path)
$outputFileName = [io.path]::ChangeExtension($path, "txt")
try {
$outputFileName = [io.path]::GetFileNameWithoutExtension($path) + '-' + $date + ".fix.csv"
$outputFileName = [io.path]::Combine([io.path]::GetDirectoryName($path), $outputFileName)
$writer = [System.IO.StreamWriter] $outputFileName
$line = $reader.ReadLine()
$writer.WriteLine($line)
while ($null -ne ($line = $reader.ReadLine())) {
$item = $line.Split(",")
if ($first -eq 1) {
$lastjourn = $item[0] #.journalkey
$first = 999
}
if ($lastjourn -ne $item[0]) {
# .JournalKey
$count = 1
}
$offsetbank = $null
$chgCR = $item[14] #.CreditAmount
$chgDB = $item[13] #.DebitAmount
foreach ($item2 in $ICList) {
$itkey = $item[19] #.ExternalCode_Account
if ($itkey -eq $item2.offsetac) {
$newCR = $null
$newDB = $null
$offsetbank = $item2.offsetbank
$tpct = 'Pct' + $item[5].substring(8, 2) #accountingDate
if ($item2.$tpct -ne 1) {
if ('-' + $item[14] + '-' -ne '--') {
#.CreditAmount
$newCR = [math]::round([double]$item[14] * $item2.$tpct, 2) #.CreditAmount
if ($item2.last -eq 'no') {
$chgCR = [math]::round([double]$chgCR - [double]$newCR, 2)
}
}
if ('-' + $item[13] + '-' -ne '--') {
#.DebitAmount
$thispct = $item2.$tpct * -1
$newDB = [math]::round([double]$item[13] * $thispct, 2) #.DebitAmount
if ($item2.last -eq 'no') {
$chgDB = [math]::round([double]$chgDB - [double]$newDB, 2)
}
}
if ($item[19] -eq '202884' -or $item[19] -eq '103886') {
}
if (('-' + $NewDB + '-' -ne '--' -and $newdb -ne 0) -or ('-' + $newCR + '-' -ne '--' -and $newcr -ne 0)) {
if ($item2.last -eq 'yes') {
$item[14] = [double]$chgCR #.CreditAmount
$item[13] = [double]$chgDB #.DebitAmount
$offsetbank = $item2.offsetbank
}
else {
$tmpcr = [double]$newcr
$tmpdb = [double]$newdb
$xyzd = 0
$xyzc = 0
if ($tmpcr -eq 0) {
$xyzd = [double]$tmpdb.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}"
, $item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9]
, $count, $item[11], $item[12], [math]::abs($xyzd.tostring("0.00")), $null, $item[15], $item[16], $item[17], $item[18], $item[19]
, $item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID_EXTRA', $item2.offsetbank)
}
if ($tmpdb -eq 0) {
$xyzc = [double]$tmpcr.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}"
, $item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9]
, $count, $item[11], $item[12], $null, [math]::abs($xyzc.tostring("0.00")), $item[15], $item[16], $item[17], $item[18], $item[19]
, $item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID_EXTRA', $item2.offsetbank)
}
$count++
}
}
else {
if ($item2.last -eq 'yes') {
$item[14] = [double]$chgCR #.CreditAmount
$item[13] = [double]$chgDB #.DebitAmount
$offsetbank = $item2.offsetbank
}
}
}
}
}
$tmpcr = [double]$item[14]
$tmpdb = [double]$item[13]
$xyzd = 0
$xyzc = 0
if ($tmpcr -eq 0) {
$xyzd = [double]$tmpdb.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}",
$item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9],
$count, $item[11], $item[12], $xyzd.tostring("0.00"), $null, $item[15], $item[16], $item[17], $item[18], $item[19],
$item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID', $offsetbank)
}
if ($tmpdb -eq 0) {
$xyzc = [double]$tmpcr.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}",
$item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9],
$count, $item[11], $item[12], $null , $xyzc.tostring("0.00"), $item[15], $item[16], $item[17], $item[18], $item[19],
$item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID', $offsetbank)
}
if ($tmpdb -ne 0 -and $tmpcr -ne 0) {
$xyzc = [double]$tmpcr.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}",
$item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9],
$count, $item[11], $item[12], $null , $xyzc.tostring("0.00"), $item[15], $item[16], $item[17], $item[18], $item[19],
$item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID', $offsetbank)
}
$count = $count + 1
# }
$lastjourn = $item[0]
}
}
catch {
Write-Log -Message $_.Exception.Message -Path $scriptlog -Colorf Black -Colorb Yellow
}
finally {
$reader.Close()
$writer.Close()
}
但这是一个日志输出,显示每个部分的时间戳。
2022-06-13 13:47:01 INFO: -------------------------------------------------------------------
2022-06-13 13:47:01 INFO: Program Begins
2022-06-13 13:47:01 INFO: -------------------------------------------------------------------
2022-06-13 13:47:01 INFO: Loading C:\ICAccounts.csv into internal table
2022-06-13 13:47:01 INFO: Reading from C:\JAN21.end.CSV to accumulate totals
2022-06-13 13:47:01 INFO: Counting records in C:\JAN21.end.CSV
2022-06-13 13:47:03 INFO: record count is: 84853
2022-06-13 13:47:03 INFO: Reading from C:JAN21.end.CSV to accumulate totals
2022-06-13 13:50:34 INFO: Calculating Percentages in internal table
2022-06-13 13:50:35 INFO: Reading from C:\JAN21.end.CSV again to calculate totals...
2022-06-13 13:50:35 INFO: ...and writing records to C:\JAN21.end-06132022-134701.fix.csv
2022-06-13 13:52:12 INFO: Process Complete
2022-06-13 13:52:12 INFO: Script Elapsed Time (HH:MM:SS) 00:05:10
第一个循环花了〜3.5分钟。
第二循环工具〜1.5分钟
edit2: @zett42,这听起来像是一个有趣的想法。但是,我以前从未混合过C#和PS。我试图用这样的内容在第一个循环中替换字符串代码,但看不到PS变量。我如何才能使PS和PS都看到字符串和数组?
$id = get-random
$code = @"
using System;
namespace HelloWorld
{
public class Program$ID
{
public static void Main(){
itco = item[2];
itkey = item[19];
if (item2.Type == "m2m" && itco != "LE0001")
{
itco = "LE0004";
}
if (itkey == "109495")
{
itkey = "109479";
}
if (itkey == "172574")
{
itkey = "172697";
}
if (itkey == "172718")
{
itkey = "172590";
}
thisrecord = itkey + itco;
if (item2.icac == "202884" || item2.icac == "103886")
{
}
if (thisrecord == item2.ICKey)
{
ttot = (Double)ttot + (Double)item[14] - (Double)item[13];
if (itkey == "202884" || itkey == "103886")
{
}
}
if (itkey == item2.offsetac)
{
toff = (Double)toff + (Double)item[14] - (Double)item[13];
}
}
}
}
"@
我正在使用:
while ($null -ne ($line = $reader.ReadLine())) {
Add-Type -TypeDefinition $code -Language CSharp
iex "[HelloWorld.Program]::Main()"
}
谢谢。
I have a PowerShell program that reads a file and looks for accounts in another table and then updates and writes out a new file with the extra information. It actually passes the file twice. Once to accumulate totals and then again to build the output file. (Whether it's a table or a file pass, I cannot sort the data because it has to be kept in original order so that transactions stay in balance.)
I've tried loading the file into an array and passing that twice and I've tried reading the file twice without building the array. Processing the file twice is orders of magnitude faster and that's not counting how long it takes to load the file, this is just my loop time.
When I run with a file that is about 12 MB with 84,000 rows, it runs fairly quickly. The whole thing takes about 5 minutes (Processing the file, not using a table).
When I process a larger file, (1.2 GB with 7 million records.) it takes way longer. The one with a table estimates it will take over 2 days. Running it by reading the file twice takes about 8 hours.
I know 7 million records is a lot, but 8 hours is a REALLY long time on a Windows server with 256 GB of RAM and way more processors than PowerShell can even use. (Even the 5 minutes to process 84,000 records seems like an eternity.)
When I was building the table, PowerShell would consume over 30 GB of RAM. It seems that when PowerShell has a large memory footprint, the performance drops drastically.
I've read a lot of things about PowerShell performance. I am not building strings with ++, I'm not piping a lot of things into other things. This really is a very basic, read a csv into an array, and then read a file and for each record in the file pass the array and when an item on the file is found in the array, do some simple calculations and move on. (the array isn't sorted because I am actually searching on 2 different fields in the array and there are duplicate values in both, so I have to process the whole thing each time, but, the array has less than 400 records in it.)
So, when I do the math, processing 7 million records, each searching 400 records, it sounds like a lot, and it is, but still, 8 hours on a high-end windows server? Is this really how slow PowerShell is?
Thank you to any and all who have read this far and have suggestions or advice.
-Todd
EDIT: Here is some of the code. The table that is read into memory is loaded with this command:
$ICList = Import-Csv -Path $Path2 -encoding Default
It has 31 sets of buckets that are used to accumulate totals for each day of the month (2 totals and on percent per day)
Here are the headers and a sample record:
ICKey,ICAC,OffsetAC,OffsetBank,Type,Last,Total,offtotal,pct,Total01,offtotal01,pct01,Total02,offtotal02,pct02,Total03,offtotal03,pct03,Total04,offtotal04,pct04,Total05,offtotal05,pct05,Total06,offtotal06,pct06,Total07,offtotal07,pct07,Total08,offtotal08,pct08,Total09,offtotal09,pct09,Total10,offtotal10,pct10,Total11,offtotal11,pct11,Total12,offtotal12,pct12,Total13,offtotal13,pct13,Total14,offtotal14,pct14,Total15,offtotal15,pct15,Total16,offtotal16,pct16,Total17,offtotal17,pct17,Total18,offtotal18,pct18,Total19,offtotal19,pct19,Total20,offtotal20,pct20,Total21,offtotal21,pct21,Total22,offtotal22,pct22,Total23,offtotal23,pct23,Total24,offtotal24,pct24,Total25,offtotal25,pct25,Total26,offtotal26,pct26,Total27,offtotal27,pct27,Total28,offtotal28,pct28,Total29,offtotal29,pct29,Total30,offtotal30,pct30,Total31,offtotal31,pct31
123456XX0012,123456,987654,XX0012,12m,no,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
The file that is read in the next loop looks like this:
001,Company_Reference_ID,XX0012,USD,ACTUALS,2020-12-31,Conversion,,,,00000003,,,000000005376026.81,,,,,,123456,00100217,,,
The loop that accumulates the totals looks like this:
$reader = [System.IO.File]::OpenText($path)
$line = $reader.ReadLine()
try {
while ($null -ne ($line = $reader.ReadLine())) {
$item = $line.Split(",")
$thisDate = $item[5].substring(8, 2) #AccountingDate
$ttot = 'Total' + $thisDate
$toff = 'Offtotal' + $thisDate
foreach ($item2 in $ICList) {
$itco = $item[2] #CompanyReferenceID
$itkey = $item[19] #ExternalCode_Account
if ($item2.Type -eq 'm2m' -and $itco -ne 'LE0001') { $itco = 'LE0004' }
if ($itkey -eq '109495') { $itkey = '109479' }
if ($itkey -eq '172574') { $itkey = '172697' }
if ($itkey -eq '172718') { $itkey = '172590' }
$thisrecord = $itkey + $itco
if ($thisrecord -eq $item2.ICKey) {
$item2.$ttot = [Double]$item2.$ttot + [Double]$item[14] - [Double]$item[13] # .Creditamount .Debitamount
}
if ($itkey -eq $item2.offsetac) {
$Item2.$toff = [Double]$Item2.$toff + [Double]$item[14] - [Double]$item[13] #.Creditamount .Debitamount
}
}
$count++
}
}
catch {
Write-Host $_.Exception.Message
}
finally {
$reader.Close()
}
The above loop takes the bulk of the time.
I then have a little loop that spins through the $ICList table to calculate percentages:
foreach ($item2 in $ICList) {
foreach ($day in $days) {
$ttot = 'Total' + '{0:d2}' -f $day
$toff = 'Offtotal' + '{0:d2}' -f $day
$tpct = 'Pct' + '{0:d2}' -f $day
$item2.$tpct = ([double]$Item2.$ttot) / ([double]$Item2.$toff)
}
}
And then I read the file again and write it out with extra records to split a single amount to multiple companies and adds new column with the company to the end. This part actually runs faster than the first loop. (Even though though it's doing a lot more processing)
$first = 1
$count = 1
$reader = [System.IO.File]::OpenText($path)
$outputFileName = [io.path]::ChangeExtension($path, "txt")
try {
$outputFileName = [io.path]::GetFileNameWithoutExtension($path) + '-' + $date + ".fix.csv"
$outputFileName = [io.path]::Combine([io.path]::GetDirectoryName($path), $outputFileName)
$writer = [System.IO.StreamWriter] $outputFileName
$line = $reader.ReadLine()
$writer.WriteLine($line)
while ($null -ne ($line = $reader.ReadLine())) {
$item = $line.Split(",")
if ($first -eq 1) {
$lastjourn = $item[0] #.journalkey
$first = 999
}
if ($lastjourn -ne $item[0]) {
# .JournalKey
$count = 1
}
$offsetbank = $null
$chgCR = $item[14] #.CreditAmount
$chgDB = $item[13] #.DebitAmount
foreach ($item2 in $ICList) {
$itkey = $item[19] #.ExternalCode_Account
if ($itkey -eq $item2.offsetac) {
$newCR = $null
$newDB = $null
$offsetbank = $item2.offsetbank
$tpct = 'Pct' + $item[5].substring(8, 2) #accountingDate
if ($item2.$tpct -ne 1) {
if ('-' + $item[14] + '-' -ne '--') {
#.CreditAmount
$newCR = [math]::round([double]$item[14] * $item2.$tpct, 2) #.CreditAmount
if ($item2.last -eq 'no') {
$chgCR = [math]::round([double]$chgCR - [double]$newCR, 2)
}
}
if ('-' + $item[13] + '-' -ne '--') {
#.DebitAmount
$thispct = $item2.$tpct * -1
$newDB = [math]::round([double]$item[13] * $thispct, 2) #.DebitAmount
if ($item2.last -eq 'no') {
$chgDB = [math]::round([double]$chgDB - [double]$newDB, 2)
}
}
if ($item[19] -eq '202884' -or $item[19] -eq '103886') {
}
if (('-' + $NewDB + '-' -ne '--' -and $newdb -ne 0) -or ('-' + $newCR + '-' -ne '--' -and $newcr -ne 0)) {
if ($item2.last -eq 'yes') {
$item[14] = [double]$chgCR #.CreditAmount
$item[13] = [double]$chgDB #.DebitAmount
$offsetbank = $item2.offsetbank
}
else {
$tmpcr = [double]$newcr
$tmpdb = [double]$newdb
$xyzd = 0
$xyzc = 0
if ($tmpcr -eq 0) {
$xyzd = [double]$tmpdb.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}"
, $item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9]
, $count, $item[11], $item[12], [math]::abs($xyzd.tostring("0.00")), $null, $item[15], $item[16], $item[17], $item[18], $item[19]
, $item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID_EXTRA', $item2.offsetbank)
}
if ($tmpdb -eq 0) {
$xyzc = [double]$tmpcr.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}"
, $item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9]
, $count, $item[11], $item[12], $null, [math]::abs($xyzc.tostring("0.00")), $item[15], $item[16], $item[17], $item[18], $item[19]
, $item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID_EXTRA', $item2.offsetbank)
}
$count++
}
}
else {
if ($item2.last -eq 'yes') {
$item[14] = [double]$chgCR #.CreditAmount
$item[13] = [double]$chgDB #.DebitAmount
$offsetbank = $item2.offsetbank
}
}
}
}
}
$tmpcr = [double]$item[14]
$tmpdb = [double]$item[13]
$xyzd = 0
$xyzc = 0
if ($tmpcr -eq 0) {
$xyzd = [double]$tmpdb.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}",
$item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9],
$count, $item[11], $item[12], $xyzd.tostring("0.00"), $null, $item[15], $item[16], $item[17], $item[18], $item[19],
$item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID', $offsetbank)
}
if ($tmpdb -eq 0) {
$xyzc = [double]$tmpcr.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}",
$item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9],
$count, $item[11], $item[12], $null , $xyzc.tostring("0.00"), $item[15], $item[16], $item[17], $item[18], $item[19],
$item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID', $offsetbank)
}
if ($tmpdb -ne 0 -and $tmpcr -ne 0) {
$xyzc = [double]$tmpcr.ToString("0.00")
$writer.WriteLine("{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21},{22},{23}",
$item[0], $item[1], $item[2], $item[3], $item[4], $item[5], $item[6], $item[7], $item[8], $item[9],
$count, $item[11], $item[12], $null , $xyzc.tostring("0.00"), $item[15], $item[16], $item[17], $item[18], $item[19],
$item[20], $item[21], 'INTERCOMPANY_AFFILIATE_ID', $offsetbank)
}
$count = $count + 1
# }
$lastjourn = $item[0]
}
}
catch {
Write-Log -Message $_.Exception.Message -Path $scriptlog -Colorf Black -Colorb Yellow
}
finally {
$reader.Close()
$writer.Close()
}
Here is a log output showing timestamps for each section.
2022-06-13 13:47:01 INFO: -------------------------------------------------------------------
2022-06-13 13:47:01 INFO: Program Begins
2022-06-13 13:47:01 INFO: -------------------------------------------------------------------
2022-06-13 13:47:01 INFO: Loading C:\ICAccounts.csv into internal table
2022-06-13 13:47:01 INFO: Reading from C:\JAN21.end.CSV to accumulate totals
2022-06-13 13:47:01 INFO: Counting records in C:\JAN21.end.CSV
2022-06-13 13:47:03 INFO: record count is: 84853
2022-06-13 13:47:03 INFO: Reading from C:JAN21.end.CSV to accumulate totals
2022-06-13 13:50:34 INFO: Calculating Percentages in internal table
2022-06-13 13:50:35 INFO: Reading from C:\JAN21.end.CSV again to calculate totals...
2022-06-13 13:50:35 INFO: ...and writing records to C:\JAN21.end-06132022-134701.fix.csv
2022-06-13 13:52:12 INFO: Process Complete
2022-06-13 13:52:12 INFO: Script Elapsed Time (HH:MM:SS) 00:05:10
First loop took ~3.5 minutes.
Second loop tool ~1.5 minutes
EDIT2:
@zett42, this sounds like an intriguing idea. However, I've never mixed C# and PS before. I tried to replace the string code in the first while loop with something like this, but it cannot see the PS variables. How can I get PS and PS to both see the Strings and array?
$id = get-random
$code = @"
using System;
namespace HelloWorld
{
public class Program$ID
{
public static void Main(){
itco = item[2];
itkey = item[19];
if (item2.Type == "m2m" && itco != "LE0001")
{
itco = "LE0004";
}
if (itkey == "109495")
{
itkey = "109479";
}
if (itkey == "172574")
{
itkey = "172697";
}
if (itkey == "172718")
{
itkey = "172590";
}
thisrecord = itkey + itco;
if (item2.icac == "202884" || item2.icac == "103886")
{
}
if (thisrecord == item2.ICKey)
{
ttot = (Double)ttot + (Double)item[14] - (Double)item[13];
if (itkey == "202884" || itkey == "103886")
{
}
}
if (itkey == item2.offsetac)
{
toff = (Double)toff + (Double)item[14] - (Double)item[13];
}
}
}
}
"@
I am calling this code with:
while ($null -ne ($line = $reader.ReadLine())) {
Add-Type -TypeDefinition $code -Language CSharp
iex "[HelloWorld.Program]::Main()"
}
Thanks, again.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该脚本运行的数据在18秒内运行了将近五分钟的数据!!
8小时的数据文件运行少于14分钟(其中2个只是在计算文件中的记录,因此我可以
在任何更改之前构建进度条):
在尽可能多地移动内部循环
之后 :使用哈希表:
谢谢@mistersmith,他们对哈希表的建议使我重新考虑了循环,我使用了哈希表来判断是否需要循环。影响最大。
另外,谢谢@mclayton。建议将零件移到内部环外部,将运行时间大约为一半。但是,使用哈希表,循环的执行幅度大大减少,因此影响有所减少,但这仍然是一个很好的建议,并提醒人们总是尽可能多地做事。
我添加的代码是:
将列表与我使用的表分开的哈希表读取。 (我知道我本可以将所有内容都用于所有内容,但是该表很小,这需要更少的整体代码更改我已经写的内容):
,我只是在每个内部循环之前添加了一个IF语句:
然后 消除这些文件上的工作组合,实际上很少有有资格获得内部循环的项目,因此减少了巨大的影响。
感谢所有回应的人。这确实是一系列建议,可以极大地帮助我改善该脚本,并且在此过程中学到了很多。
The script running with data that took almost five minutes now runs in 18 seconds!!
The 8 hour data file takes less than 14 minutes to run (And 2 of that is just counting the records in the file so I can build a progress bar)
Before any changes:
After moving as much as I could outside the inner loop:
After using the hash table:
Thank you to @mistersmith, their suggestion of a hash table made me rethink the loop and I used a hash table to tell if the loop was needed. That had the largest impact.
Also, thank you to @mclayton. The recommendation to move parts outside the inner loop by itself cut run time about in half. However, with the hash table, the loop is being executed significantly less, so the impact is somewhat reduced, but it was still an excellent suggestion and a reminder to always do things as few times as you can get away with.
The code I added was:
Read the list into a hash table separate from the table I was using. (I know I could have just used the hash table for everything, but the table is small and this required less overall code changes to what I already wrote):
Then, I simply added an if statement before each of the inner while loops:
Turns out the mix of work on these files, there really are very few items that qualified for the inner loop, so reducing that had a huge impact.
Thank you to everyone who responded. It really was a combination of suggestions that helped me greatly improve this script and I learned a lot in the process.