Powershell - 从包含不合格数据的文件创建 CSV

发布于 2025-01-11 14:17:25 字数 2417 浏览 0 评论 0原文

我有 4K 以上的文件需要转换为 CSV 文件。需要提取每个字段，然后按照找到它们的顺序将其添加到 CSV 文件中。

我有一个脚本，希望可以为此目的重复使用。下面的脚本将读取第一行并检查它，然后处理剩余的行。

我更改了脚本以仅捕获第一行的第一个字符串。那行得通。我能够捕获“ADJMT-HIS-WK”。我现在需要处理文件的其余部分。

正如您所看到的，文件的其余部分由两个数字组成，一个是列名，然后是行尾的更多字符，如下所示。

05 DTE-CALL               9(6)

我需要在每一行上捕获列名称（在本例中，该行是DTE-CALL）。列名每行仅出现一次，并且始终位于两个数字之后。获得数据后，我必须将其输出到一个 CSV 文件中，其中包含我捕获的所有字段，包括第一行中的表名称以及文件其余部分中的所有列名称。每个文件可能至少具有单个出现的每个或具有数百个列名的单个表名。

我的输出文件需要看起来像这样

ADJMT-HIST-WK,SRVCG-OFC-CDE,ST-CDE-FMHA,CTY-DST-CDE,SRVCG-CDE-OFC-CTY,.,.,.,. and so on.

如果我可以将每个字段用引号引起来，这将节省我以后的步骤。

有人可以帮我简化这个程序吗？

原始数据

ADJMT-HIST-WK                            VER     1 D  SUFFIX

          05 SRVCG-OFC-CDE                                                     
          10 ST-CDE-FMHA                           9(2)                       
          10 CTY-DST-CDE                           9                          
          10 SRVCG-CDE-OFC-CTY                     9(2)                       
         05 DTE-CALL                               9(6)                       
         05 CR-AMT-ADJMT                           S9(9)V99                   
         05 DR-AMT-ADJMT                           S9(9)V99                   
         05 PROCG-STAT-INDCTR                      X(1)                       
         05 DTE-DPST                                                          
          10 MO-GRGRN                              9(2)                       
          10 DAY-GRGRN                             9(2)                       
          10 YR                                    9(2)                       
         05 REAS-CDE-CB                            X(3)                       
         05 USER-ID                                                           
          10 AUTHY-CDE-TRML-OPRTR                  X(1)

脚本

$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse 
$regex = '^\S+'
ForEach ($CBLFile in $CBLFileList) {
  $firstLine, $remainingLines = $CBLFile | Get-Content
  if ($firstLine -cmatch $regex) {
      $toRemove = $Matches[0].Trim()
      Write-Host "Found Match - $toRemove " -foregroundcolor Red
 #   & { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
 #     Set-Content -LiteralPath $CBLFile.FullName
  }
}

原文

I have 4K+ files that need to be converted into CSV files.
Each field needs to be extracted and then added to a CSV file in the order they are found.

I have a script that I'm hoping I can re-use for this purpose. The script below will read the first line and examine it then process the remaining lines.

I altered the script to capture JUST the first string on the first line. That works. I am able to capture "ADJMT-HIS-WK".
I now need to process the rest of the file.

As you can see the rest of the file consists of two numerics a column name and then more characters at the end of the line like this.

05 DTE-CALL               9(6)

I need to capture just column name (in this case for this line it's DTE-CALL) on every line. The column name appears only once per line and it is always after the two numerics.
Once I have the data, I then have to output it into a CSV file consisting of all the fields I captured including the Table name from line one and all the column names from the rest of the file.
Each file may have at minimum a single occurrence of each or a single table name with hundreds of column names.

My output file needs to look like this

ADJMT-HIST-WK,SRVCG-OFC-CDE,ST-CDE-FMHA,CTY-DST-CDE,SRVCG-CDE-OFC-CTY,.,.,.,. and so on.

If I can wrap each field in quotes, that would save me a step later on.

Would anyone be able to simplify this procedure for me?

RAW Data

ADJMT-HIST-WK                            VER     1 D  SUFFIX

          05 SRVCG-OFC-CDE                                                     
          10 ST-CDE-FMHA                           9(2)                       
          10 CTY-DST-CDE                           9                          
          10 SRVCG-CDE-OFC-CTY                     9(2)                       
         05 DTE-CALL                               9(6)                       
         05 CR-AMT-ADJMT                           S9(9)V99                   
         05 DR-AMT-ADJMT                           S9(9)V99                   
         05 PROCG-STAT-INDCTR                      X(1)                       
         05 DTE-DPST                                                          
          10 MO-GRGRN                              9(2)                       
          10 DAY-GRGRN                             9(2)                       
          10 YR                                    9(2)                       
         05 REAS-CDE-CB                            X(3)                       
         05 USER-ID                                                           
          10 AUTHY-CDE-TRML-OPRTR                  X(1)

Script

$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse 
$regex = '^\S+'
ForEach ($CBLFile in $CBLFileList) {
  $firstLine, $remainingLines = $CBLFile | Get-Content
  if ($firstLine -cmatch $regex) {
      $toRemove = $Matches[0].Trim()
      Write-Host "Found Match - $toRemove " -foregroundcolor Red
 #   & { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
 #     Set-Content -LiteralPath $CBLFile.FullName
  }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眉黛浅 2025-01-18 14:17:25

您可以使用完全相同的方法来提取列名称：使用 -{c,i,}match 来测试字符串是否具有相关数据，使用 $Matches 来提取。

您可以使用 -f 字符串格式运算符来引用每个名称。

$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse 
$tableNameRegex = '^\S+'
$columnNameRegex = '^\s*\d{2}\s+(\S+)'

foreach ($CBLFile in $CBLFileList) {
  $firstLine, $remainingLines = $CBLFile | Get-Content
  if ($firstLine -match $tableNameRegex) {
    # extract table name, add quotes
    $tableName = '"{0}"' -f $Matches[0]

    # then do the same for all the columns names
    $columnNames = foreach($line in $remainingLines){
      if($line -match $columnNameRegex){
        # again, add quotes before outputting
        '"{0}"' -f $Matches[1]
      }
    }

    # concatenate all names with comma, write to disk
    @($tableName;$columnNames) -join ',' |Set-Content "$($CBLFile.BaseName).csv"
  }
}

You can use the exact same approach to extract the column names: use -{c,i,}match to test if the string has the relevant data, $Matches to extract.

You can use the -f string format operator to quote each name.

$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse 
$tableNameRegex = '^\S+'
$columnNameRegex = '^\s*\d{2}\s+(\S+)'

foreach ($CBLFile in $CBLFileList) {
  $firstLine, $remainingLines = $CBLFile | Get-Content
  if ($firstLine -match $tableNameRegex) {
    # extract table name, add quotes
    $tableName = '"{0}"' -f $Matches[0]

    # then do the same for all the columns names
    $columnNames = foreach($line in $remainingLines){
      if($line -match $columnNameRegex){
        # again, add quotes before outputting
        '"{0}"' -f $Matches[1]
      }
    }

    # concatenate all names with comma, write to disk
    @($tableName;$columnNames) -join ',' |Set-Content "$($CBLFile.BaseName).csv"
  }
}

回复收藏 0 原文

~没有更多了~