Powershell - 从包含不合格数据的文件创建 CSV
我有 4K 以上的文件需要转换为 CSV 文件。 需要提取每个字段,然后按照找到它们的顺序将其添加到 CSV 文件中。
我有一个脚本,希望可以为此目的重复使用。下面的脚本将读取第一行并检查它,然后处理剩余的行。
我更改了脚本以仅捕获第一行的第一个字符串。那行得通。我能够捕获“ADJMT-HIS-WK”。 我现在需要处理文件的其余部分。
正如您所看到的,文件的其余部分由两个数字组成,一个是列名,然后是行尾的更多字符,如下所示。
05 DTE-CALL 9(6)
我需要在每一行上捕获列名称(在本例中,该行是DTE-CALL)。列名每行仅出现一次,并且始终位于两个数字之后。 获得数据后,我必须将其输出到一个 CSV 文件中,其中包含我捕获的所有字段,包括第一行中的表名称以及文件其余部分中的所有列名称。 每个文件可能至少具有单个出现的每个或具有数百个列名的单个表名。
我的输出文件需要看起来像这样
ADJMT-HIST-WK,SRVCG-OFC-CDE,ST-CDE-FMHA,CTY-DST-CDE,SRVCG-CDE-OFC-CTY,.,.,.,. and so on.
如果我可以将每个字段用引号引起来,这将节省我以后的步骤。
有人可以帮我简化这个程序吗?
原始数据
ADJMT-HIST-WK VER 1 D SUFFIX
05 SRVCG-OFC-CDE
10 ST-CDE-FMHA 9(2)
10 CTY-DST-CDE 9
10 SRVCG-CDE-OFC-CTY 9(2)
05 DTE-CALL 9(6)
05 CR-AMT-ADJMT S9(9)V99
05 DR-AMT-ADJMT S9(9)V99
05 PROCG-STAT-INDCTR X(1)
05 DTE-DPST
10 MO-GRGRN 9(2)
10 DAY-GRGRN 9(2)
10 YR 9(2)
05 REAS-CDE-CB X(3)
05 USER-ID
10 AUTHY-CDE-TRML-OPRTR X(1)
脚本
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$regex = '^\S+'
ForEach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -cmatch $regex) {
$toRemove = $Matches[0].Trim()
Write-Host "Found Match - $toRemove " -foregroundcolor Red
# & { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
# Set-Content -LiteralPath $CBLFile.FullName
}
}
I have 4K+ files that need to be converted into CSV files.
Each field needs to be extracted and then added to a CSV file in the order they are found.
I have a script that I'm hoping I can re-use for this purpose. The script below will read the first line and examine it then process the remaining lines.
I altered the script to capture JUST the first string on the first line. That works. I am able to capture "ADJMT-HIS-WK".
I now need to process the rest of the file.
As you can see the rest of the file consists of two numerics a column name and then more characters at the end of the line like this.
05 DTE-CALL 9(6)
I need to capture just column name (in this case for this line it's DTE-CALL) on every line. The column name appears only once per line and it is always after the two numerics.
Once I have the data, I then have to output it into a CSV file consisting of all the fields I captured including the Table name from line one and all the column names from the rest of the file.
Each file may have at minimum a single occurrence of each or a single table name with hundreds of column names.
My output file needs to look like this
ADJMT-HIST-WK,SRVCG-OFC-CDE,ST-CDE-FMHA,CTY-DST-CDE,SRVCG-CDE-OFC-CTY,.,.,.,. and so on.
If I can wrap each field in quotes, that would save me a step later on.
Would anyone be able to simplify this procedure for me?
RAW Data
ADJMT-HIST-WK VER 1 D SUFFIX
05 SRVCG-OFC-CDE
10 ST-CDE-FMHA 9(2)
10 CTY-DST-CDE 9
10 SRVCG-CDE-OFC-CTY 9(2)
05 DTE-CALL 9(6)
05 CR-AMT-ADJMT S9(9)V99
05 DR-AMT-ADJMT S9(9)V99
05 PROCG-STAT-INDCTR X(1)
05 DTE-DPST
10 MO-GRGRN 9(2)
10 DAY-GRGRN 9(2)
10 YR 9(2)
05 REAS-CDE-CB X(3)
05 USER-ID
10 AUTHY-CDE-TRML-OPRTR X(1)
Script
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$regex = '^\S+'
ForEach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -cmatch $regex) {
$toRemove = $Matches[0].Trim()
Write-Host "Found Match - $toRemove " -foregroundcolor Red
# & { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
# Set-Content -LiteralPath $CBLFile.FullName
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用完全相同的方法来提取列名称:使用
-{c,i,}match
来测试字符串是否具有相关数据,使用$Matches
来提取。您可以使用
-f
字符串格式运算符来引用每个名称。You can use the exact same approach to extract the column names: use
-{c,i,}match
to test if the string has the relevant data,$Matches
to extract.You can use the
-f
string format operator to quote each name.