powershell获取特定子串位置的总和

发布于 2025-01-18 02:39:53 字数 324 浏览 2 评论 0原文

如果满足以下条件,如何从子字符串中获取文件的总和并将总和放在特定位置(不同的行)上,如果有以下条件:

获取以 开头的行的位置 3 到 13 的数字总和将总和放在以 S 开头的行的位置 10 到 14 上

例如,如果我有这个文件:

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        xxxxx

我想获得 38.95、18.95 和 18.95 的总和然后将总和放在以 S 开头的线下方的位置 xxxxx 上。

How can I get the sum of a file from a substring and placing the sum on a specific position (different line) using powershell if have the following conditions:

Get the sum of the numbers from position 3 to 13 of a line that is starting with a character D. Place the sum on position 10 to 14 on the line that starts with the S

So for example, if i have this file:

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        xxxxx

I want to get the sum of 38.95, 18.95 and 18.95 and then place the sum on position xxxxx under the line that starts with the S.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

信仰 2025-01-25 02:39:53

PowerShell 的开关 语句具有强大但鲜为人知的功能,允许您迭代文件 (-file) 的行并通过正则表达式匹配行(-正则表达式

switch -file 不仅switch -file方便,而且比在管道中使用 cmdlet快得多(请参阅底部部分)。

[double] $sum = 0

switch -regex -file file.txt {

  # Note: The string to the left of each script block below ({ ... }), 
  #       e.g., '^D', is the regex to match each line against.
  #       Inside the script blocks, $_ refers to the input line at hand.

  # Extract number, add to sum, output the line.
  '^D' { $sum += $_.Substring(2, 11); $_; continue }

  # Summary line: place sum at character position 10, with 0-padding
  # Note: `-replace ',', '.'` is only needed if your culture uses "," as the
  #       decimal mark.
  '^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue }
  
  # All other lines: pass them through.
  default { $_ }

}

注意:

  • 脚本块中的继续会短路进一步匹配当前行;相反,如果您使用 break将不会处理更多行
  • 根据后来的评论,我假设您需要在 S 行的字符位置 10 处使用 18 个字符的 0 左填充数字>。

对于您的示例文件,上面的结果是:

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        000000000000076.85

可选阅读:比较 switch -file ...Get-Content ... 的性能 | ForEach-Object ...

运行以下测试脚本:

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds, 
  (Measure-Command { get-content $tmpFile | % { $_ }  }).TotalSeconds
  Remove-Item $tmpFile
}

例如,在我的机器上产生以下计时(绝对数字并不重要,但它们的比率应该让您有一个感觉):

0.0578924   # switch -file
6.0417638   # Get-Content | ForEach-Object

也就是说,基于管道的解决方案比 switch -file 解决方案慢大约 100 (!) 倍。


深入挖掘:

<一个href="https://stackoverflow.com/users/702944/frode-f">Frode F. 指出 Get-Content 对于大文件来说速度很慢 - 尽管它的便利性使得它是一个流行的选择 - 并提到直接使用 .NET Framework 作为替代方案:

  • 使用 [System.IO.File]::ReadAllLines();但是,考虑到它将整个文件读取到内存中,这只是小文件的一个选项。

  • 在循环中使用[System.IO.StreamReader]ReadLine()方法。

但是,无论使用什么特定的 cmdlet,使用管道本身都会带来开销。当性能很重要时(但只有那时)您应该避免它。

这是一个更新的测试,其中包括使用 .NET Framework 方法的命令,无论是否带有管道(使用 内在 .ForEach() 方法 需要 PSv4+):

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds
  (Measure-Command { foreach ($line in [IO.File]::ReadLines((Convert-Path $tmpFile))) { $line } }).TotalSeconds
  (Measure-Command { 
    $sr = [IO.StreamReader] (Convert-Path $tmpFile)
    while(-not $sr.EndOfStream) { $sr.ReadLine() }
    $sr.Close() 
  }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds
  (Measure-Command { Get-Content $tmpFile | % { $_ }  }).TotalSeconds
  
  Remove-Item $tmpFile
}

示例结果,从最快到最慢:

0.0124441 # switch -file
0.0365348 # [System.IO.File]::ReadLine() in foreach loop
0.0481214 # [System.IO.StreamReader] in a loop
0.1614621 # [System.IO.File]::ReadAllText() with .ForEach() method
0.2745749 # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object
0.5925222 # (pipeline) Get-Content with ForEach-Object

switch -file 速度最快,大约是 3 倍,其次是无管道 .NET 解决方案;使用 .ForEach() 添加另一个因子 3。
简单地引入管道(ForEach-Object 而不是 .ForEach())会增加另一个因子 2;最后,将管道与 Get-ContentForEach-Object 一起使用又增加了一个因子 2。

PowerShell's switch statement has powerful, but little-known features that allow you to iterate over the lines of a file (-file) and match lines by regular expressions (-regex).

Not only is switch -file convenient, it is also much faster than using cmdlets in a pipeline (see bottom section).

[double] $sum = 0

switch -regex -file file.txt {

  # Note: The string to the left of each script block below ({ ... }), 
  #       e.g., '^D', is the regex to match each line against.
  #       Inside the script blocks, $_ refers to the input line at hand.

  # Extract number, add to sum, output the line.
  '^D' { $sum += $_.Substring(2, 11); $_; continue }

  # Summary line: place sum at character position 10, with 0-padding
  # Note: `-replace ',', '.'` is only needed if your culture uses "," as the
  #       decimal mark.
  '^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue }
  
  # All other lines: pass them through.
  default { $_ }

}

Note:

  • continue in the script blocks short-circuits further matching for the line at hand; by contrast, if you used break, no further lines would be processed.
  • Based on a later comment, I'm assuming you want an 18-character 0-left-padded number on the S line at character position 10.

With your sample file, the above yields:

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        000000000000076.85

Optional reading: Comparing the performance of switch -file ... to Get-Content ... | ForEach-Object ...

Running the following test script:

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds, 
  (Measure-Command { get-content $tmpFile | % { $_ }  }).TotalSeconds
  Remove-Item $tmpFile
}

yields the following timings on my machine, for instance (the absolute numbers aren't important, but their ratio should give you a sense):

0.0578924   # switch -file
6.0417638   # Get-Content | ForEach-Object

That is, the pipeline-based solution is about 100 (!) times slower than the switch -file solution.


Digging deeper:

Frode F. points out that Get-Content is slow with large files - though its convenience makes it a popular choice - and mentions using the .NET Framework directly as an alternative:

  • Using [System.IO.File]::ReadAllLines(); however, given that it reads the entire file into memory, that is only an option with smallish files.

  • Using [System.IO.StreamReader]'s ReadLine() method in a loop.

However, use of the pipeline in itself, irrespective of the specific cmdlets used, introduces overhead. When performance matters - but only then - you should avoid it.

Here's an updated test that includes commands that use the .NET Framework methods, with and without the pipeline (the use of intrinsic .ForEach() method requires PSv4+):

& {
  # Create a sample file with 100K lines.
  1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName())
  
  (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds
  (Measure-Command { foreach ($line in [IO.File]::ReadLines((Convert-Path $tmpFile))) { $line } }).TotalSeconds
  (Measure-Command { 
    $sr = [IO.StreamReader] (Convert-Path $tmpFile)
    while(-not $sr.EndOfStream) { $sr.ReadLine() }
    $sr.Close() 
  }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds
  (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds
  (Measure-Command { Get-Content $tmpFile | % { $_ }  }).TotalSeconds
  
  Remove-Item $tmpFile
}

Sample results, from fastest to slowest:

0.0124441 # switch -file
0.0365348 # [System.IO.File]::ReadLine() in foreach loop
0.0481214 # [System.IO.StreamReader] in a loop
0.1614621 # [System.IO.File]::ReadAllText() with .ForEach() method
0.2745749 # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object
0.5925222 # (pipeline) Get-Content with ForEach-Object

switch -file is the fastest by a factor of around 3, followed by the no-pipeline .NET solutions; using .ForEach() adds another factor of 3.
Simply introducing the pipeline (ForEach-Object instead of .ForEach()) adds another factor of 2; finally, using the pipeline with Get-Content and ForEach-Object adds another factor of 2.

时光磨忆 2025-01-25 02:39:53

您可以尝试:

  • -match 使用 regex-pattern 查找行
  • .NET 字符串方法 Substring() 从“D”行中提取值
  • Measure-Object -Sum 计算总和
  • -replace 以插入值(使用正则表达式模式搜索)。

前任:

$text = Get-Content -Path file.txt

$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum

$text | ForEach-Object {
    if($_ -match '^S') {
        #Line starts with S -> Insert sum
        $_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
    } else {
        #Not "S"-line -> output original content
        $_
    }
} | Set-Content -Path file.txt

You could try:

  • -match to find the lines using regex-pattern
  • The .NET string-method Substring() to extract the values from the "D"-lines
  • Measure-Object -Sum to calculate the sum
  • -replace to insert the value (searches using regex-pattern).

Ex:

$text = Get-Content -Path file.txt

$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum

$text | ForEach-Object {
    if($_ -match '^S') {
        #Line starts with S -> Insert sum
        $_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
    } else {
        #Not "S"-line -> output original content
        $_
    }
} | Set-Content -Path file.txt
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文