PowerShell 使用 -Tail 和 -Wait 提取两个字符串之间的文本

发布于 2025-01-14 04:31:48 字数 2249 浏览 2 评论 0原文

我有一个包含大量日志消息的文本文件。 我想提取两个字符串模式之间的消息。我希望提取的消息显示在文本文件中。

我尝试了以下方法。它可以工作,但不支持 Get-Content 的 -Wait 和 -Tail 选项。此外,提取的结果显示在一行中,但与文本文件不同。欢迎输入:-)

示例代码

function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){

    # Get content from the input file
    $fileContent = Get-Content $filePath

    # Regular expression (Regex) of the given start and end patterns
    $pattern = "$startPattern(.*?)$endPattern"

    # Perform the Regex opperation
    $result = [regex]::Match($fileContent,$pattern).Value

    # Finally return the result to the caller
    return $result
}

# Clear the screen
Clear-Host

$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'

# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input

基于 Theo 的答案改进的脚本。 以下几点需要改进:

  1. 尽管我调整了脚本中的缓冲区大小,但输出的开头和结尾以某种方式被修剪。
  2. 如何将每个匹配结果包装到 START 和 END 字符串中?
  3. 我仍然不知道如何使用 -Wait-Tail 选项

更新的脚本

# Clear the screen
Clear-Host

# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
  [console]::bufferwidth = $bw
  [console]::bufferheight = $bh
}
else
{
    $pshost = get-host
    $pswindow = $pshost.ui.rawui
    $newsize = $pswindow.buffersize
    $newsize.height = $bh
    $newsize.width = $bw
    $pswindow.buffersize = $newsize
}


function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
    # Get content from the input file
    $fileContent = Get-Content -Path $filePath -Raw
    # Regular expression (Regex) of the given start and end patterns
    $pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
    # Perform the Regex operation and output
    [regex]::Match($fileContent,$pattern).Groups[1].Value
}

# Input file path
 $inputFile = "THE-LOG-FILE.log"

# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'


Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile

I have a text file with a large number of log messages.
I want to extract the messages between two string patterns. I want the extracted message to appear as it is in the text file.

I tried the following methods. It works, but doesn't support Get-Content's -Wait and -Tail options. Also, the extracted results are displayed in one line, but not like the text file. Inputs are welcome :-)

Sample Code

function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){

    # Get content from the input file
    $fileContent = Get-Content $filePath

    # Regular expression (Regex) of the given start and end patterns
    $pattern = "$startPattern(.*?)$endPattern"

    # Perform the Regex opperation
    $result = [regex]::Match($fileContent,$pattern).Value

    # Finally return the result to the caller
    return $result
}

# Clear the screen
Clear-Host

$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'

# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input

Improved script based on Theo's answer.
The following points need to be improved:

  1. The beginning and end of the output is somehow trimmed despite I adjusted the buffer size in the script.
  2. How to wrap each matched result into START and END string?
  3. Still I could not figure out how to use the -Wait and -Tail options

Updated Script

# Clear the screen
Clear-Host

# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
  [console]::bufferwidth = $bw
  [console]::bufferheight = $bh
}
else
{
    $pshost = get-host
    $pswindow = $pshost.ui.rawui
    $newsize = $pswindow.buffersize
    $newsize.height = $bh
    $newsize.width = $bw
    $pswindow.buffersize = $newsize
}


function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
    # Get content from the input file
    $fileContent = Get-Content -Path $filePath -Raw
    # Regular expression (Regex) of the given start and end patterns
    $pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
    # Perform the Regex operation and output
    [regex]::Match($fileContent,$pattern).Groups[1].Value
}

# Input file path
 $inputFile = "THE-LOG-FILE.log"

# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'


Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

喜爱皱眉﹌ 2025-01-21 04:31:48
  • 您需要对Get-Content 调用,在管道中,例如使用 ForEach-Object,如果您希望在读取行时对其进行处理。

    • 如果您使用Get-Content -Wait,这是必须的,因为这样的调用不会自行终止(它会无限期地等待新行添加到文件中) ),但在管道内,它的输出可以在接收时进行处理,甚至在命令终止之前也是如此。
  • 您正在尝试跨多行进行匹配,只有使用 -Raw 开关,Get-Content 输出才有效- 默认情况下,Get-Content 逐行读取其输入文件。

    • 但是,-Raw-Wait 不兼容。
    • 因此,您必须坚持逐行处理,这要求您分别匹配开始和结束模式,并跟踪何时处理这两个模式之间的线条。

这是概念证明,但请注意以下几点:

  • -Tail 100 是硬编码的 - 根据需要进行调整或将其设为另一个参数。

  • 使用 -Wait 意味着该函数将无限期运行 - 等待新行添加到 $filePath - 因此您需要使用 Ctrl-C 停止它。

    • 虽然您可以在管道中使用 Get-TextBetweenTwoStrings 调用自身进行逐个对象处理,将其结果分配给变量 ($result = ...)在使用Ctrl-C终止时将不起作用,因为这种终止方法也会中止赋值操作。

    • 为了解决此限制,下面的函数被定义为 高级功能,自动启用对常见 -OutVariable 参数,即使使用 Ctrl-C 终止,也会填充该参数;您的示例调用将如下所示(正如 Theo 所说,不要使用自动 $input 变量作为自定义变量):

      # 无限期地在输入文件中查找感兴趣的块,
      # 并在找到它们时输出它们。
      # 使用 Ctrl-C 终止后,$result 还将包含块
      # 找到了,如果有的话。
      Get-TextBetweenTwoStrings -OutVariable 结果 -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
      
  • 根据您的反馈,您希望行块包含开始和结束模式匹配的整行,因此下面的正则表达式包含在 .*

  • The $startPattern$endPattern 参数中的单词 pattern 有点不明确,因为它表明它们本身是正则表达式 因此可以使用按原样或按原样嵌入在 -match 运算符。
    但是,在下面的解决方案中,我假设它们被视为文字字符串,这就是为什么它们用 [regex]::Escape();如果这些参数本身确实是正则表达式,则只需省略这些调用即可;即:

    $startRegex = '.*' + $startPattern + '.*'
    $endRegex = '.*' + $endPattern + '.*'
    
  • 该解决方案假设块之间没有重叠,并且在给定块中,开始和结束模式位于不同的行上。

  • 找到的每个块都输出为单个多行字符串,使用 LF ("`n") 作为换行符;如果您想要 CRLF 换行序列,请使用 "`r`n";对于平台本机换行符格式(Windows 上为 CRLF,类 Unix 平台上为 LF),请使用 [Environment]::NewLine

# Note the use of "-" after "Get", to adhere to PowerShell's
# "<Verb>-<Noun>" naming convention.
function Get-TextBetweenTwoStrings {

  # Make the function an advanced one, so that it supports the 
  # -OutVariable common parameter.
  [CmdletBinding()]
  param(
    $startPattern, 
    $endPattern, 
    $filePath
  )

  # Note: If $startPattern and $endPattern are themselves
  #       regexes, omit the [regex]::Escape() calls.
  $startRegex = '.*' + [regex]::Escape($startPattern) + '.*'
  $endRegex = '.*' + [regex]::Escape($endPattern) + '.*'

  $inBlock = $false
  $block = [System.Collections.Generic.List[string]]::new()

  Get-Content -Tail 100 -Wait $filePath | ForEach-Object {
    if ($inBlock) {
      if ($_ -match $endRegex) {
        $block.Add($Matches[0])
        # Output the block of lines as a single, multi-line string
        $block -join "`n"
        $inBlock = $false; $block.Clear()       
      }
      else {
        $block.Add($_)
      }
    }
    elseif ($_ -match $startRegex) {
      $inBlock = $true
      $block.Add($Matches[0])
    }
  }

}
  • You need to perform streaming processing of your Get-Content call, in a pipeline, such as with ForEach-Object, if you want to process lines as they're being read.

    • This is a must if you're using Get-Content -Wait, because such a call doesn't terminate by itself (it keeps waiting for new lines to be added to the file, indefinitely), but inside a pipeline its output can be processed as it is being received, even before the command terminates.
  • You're trying to match across multiple lines, which with Get-Content output would only work if you used the -Raw switch - by default, Get-Content reads its input file(s) line by line.

    • However, -Raw is incompatible with -Wait.
    • Therefore, you must stick with line-by-line processing, which requires that you match the start and end patterns separately, and keep track of when you're processing lines between those two patterns.

Here's a proof of concept, but note the following:

  • -Tail 100 is hard-coded - adjust as needed or make it another parameter.

  • The use of -Wait means that the function will run indefinitely - waiting for new lines to be added to $filePath - so you'll need to use Ctrl-C to stop it.

    • While you can use a Get-TextBetweenTwoStrings call itself in a pipeline for object-by-object processing, assigning its result to a variable ($result = ...) won't work when terminating with Ctrl-C, because this method of termination also aborts the assignment operation.

    • To work around this limitation, the function below is defined as an advanced function, which automatically enables support for the common -OutVariable parameter, which is populated even in the event of termination with Ctrl-C; your sample call would then look as follows (as Theo notes, don't use the automatic $input variable as a custom variable):

      # Look for blocks of interest in the input file, indefinitely,
      # and output them as they're being found.
      # After termination with Ctrl-C, $result will also contain the blocks
      # found, if any.
      Get-TextBetweenTwoStrings -OutVariable result -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
      
  • Per your feedback you want the block of lines to encompass the full lines on which the start and end patterns match, so the regexes below are enclosed in .*

  • The word pattern in your $startPattern and $endPattern parameters is a bit ambiguous in that it suggests that they themselves are regexes that can therefore be used as-is or embedded as-is in a larger regex on the RHS of the -match operator.
    However, in the solution below I am assuming that they are be treated as literal strings, which is why they are escaped with [regex]::Escape(); simply omit these calls if these parameters are indeed regexes themselves; i.e.:

    $startRegex = '.*' + $startPattern + '.*'
    $endRegex = '.*' + $endPattern + '.*'
    
  • The solution assumes there is no overlap between blocks and that, in a given block, the start and end patterns are on separate lines.

  • Each block found is output as a single, multi-line string, using LF ("`n") as the newline character; if you want a CRLF newline sequences instead, use "`r`n"; for the platform-native newline format (CRLF on Windows, LF on Unix-like platforms), use [Environment]::NewLine.

# Note the use of "-" after "Get", to adhere to PowerShell's
# "<Verb>-<Noun>" naming convention.
function Get-TextBetweenTwoStrings {

  # Make the function an advanced one, so that it supports the 
  # -OutVariable common parameter.
  [CmdletBinding()]
  param(
    $startPattern, 
    $endPattern, 
    $filePath
  )

  # Note: If $startPattern and $endPattern are themselves
  #       regexes, omit the [regex]::Escape() calls.
  $startRegex = '.*' + [regex]::Escape($startPattern) + '.*'
  $endRegex = '.*' + [regex]::Escape($endPattern) + '.*'

  $inBlock = $false
  $block = [System.Collections.Generic.List[string]]::new()

  Get-Content -Tail 100 -Wait $filePath | ForEach-Object {
    if ($inBlock) {
      if ($_ -match $endRegex) {
        $block.Add($Matches[0])
        # Output the block of lines as a single, multi-line string
        $block -join "`n"
        $inBlock = $false; $block.Clear()       
      }
      else {
        $block.Add($_)
      }
    }
    elseif ($_ -match $startRegex) {
      $inBlock = $true
      $block.Add($Matches[0])
    }
  }

}
意犹 2025-01-21 04:31:48

首先,你不应该使用$input作为自定义变量名,因为这是一个自动变量

然后,您将文件作为字符串数组读取,您更愿意将其读取为单个多行字符串。对于该附加开关 -Raw 到 Get-Content 调用。

您创建的正则表达式不允许在您给出的开始和结束模式中使用 fgor regex 特殊字符,因此我建议在创建正则表达式时在这些模式上使用 [regex]::Escape()细绳。

虽然您的正则表达式确实在括号内使用了组捕获序列,但在获取您寻求的值时您并没有使用它。

最后,我建议对函数名称使用 PowerShell 命名约定(动词-名词)

Try

function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
    # Get content from the input file
    $fileContent = Get-Content -Path $filePath -Raw
    # Regular expression (Regex) of the given start and end patterns
    $pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
    # Perform the Regex operation and output
    [regex]::Match($fileContent,$pattern).Groups[1].Value
}

$inputFile    = "D:\Test\THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern   = 'END-OF-PATTERN'

Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile

会产生类似以下结果:

blahblah
more lines here

The (?is) 使正则表达式不区分大小写并具有点也匹配换行符


很高兴看到您正在使用我的 Get-TextBetweenTwoStrings 函数版本,但是我相信您将控制台中的输出误认为是在专用文本编辑器中输出。在控制台中,太长的行将被截断,而在记事本等文本编辑器中,您可以选择换行长行或使用水平滚动条。

如果您只是附加

| Set-Content -Path 'X:\wherever\theoutput.txt'

Get-TextBetweenTwoStrings .. 调用,您会发现当您在 Word 或记事本中打开它时,这些行不会被截断。

事实上,您可以在该行后面加上

notepad 'X:\wherever\theoutput.txt'

记事本立即打开该文件。

First of all, you should not use $input as self-defined variable name, because this is an Automatic variable.

Then, you are reading the file as a string array, where you would rather read is as a single, multiline string. For that append switch -Raw to the Get-Content call.

The regex you are creating does not allow fgor regex special characters in the start- and end patterns you give, so it I would suggest using [regex]::Escape() on these patterns when creating the regex string.

While your regex does use a group capturing sequence inside the brackets, you are not using that when it comes to getting the value you seek.

Finally, I would recommend using PowerShell naming convention (Verb-Noun) for the function name

Try

function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
    # Get content from the input file
    $fileContent = Get-Content -Path $filePath -Raw
    # Regular expression (Regex) of the given start and end patterns
    $pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
    # Perform the Regex operation and output
    [regex]::Match($fileContent,$pattern).Groups[1].Value
}

$inputFile    = "D:\Test\THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern   = 'END-OF-PATTERN'

Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile

Would result in something like:

blahblah
more lines here

The (?is) makes the regex case-insensitive and have the dot match linebreaks as well


Nice to see you're using my version of the Get-TextBetweenTwoStrings function, however I believe you are mistaking the output in the console to output as in a dedicated text editor. In the console, too long lines will be truncated, whereas in a text editor like notepad, you can choose to wrap long lines or have a horizontal scrollbar.

If you simply append

| Set-Content -Path 'X:\wherever\theoutput.txt'

to the Get-TextBetweenTwoStrings .. call, you will find the lines are NOT truncated when you open it in Word or notepad for instance.

In fact, you can have that line folowed by

notepad 'X:\wherever\theoutput.txt'

to have notepad open that file straight away.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文