PowerShell 使用 -Tail 和 -Wait 提取两个字符串之间的文本
我有一个包含大量日志消息的文本文件。 我想提取两个字符串模式之间的消息。我希望提取的消息显示在文本文件中。
我尝试了以下方法。它可以工作,但不支持 Get-Content 的 -Wait 和 -Tail 选项。此外,提取的结果显示在一行中,但与文本文件不同。欢迎输入:-)
示例代码
function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){
# Get content from the input file
$fileContent = Get-Content $filePath
# Regular expression (Regex) of the given start and end patterns
$pattern = "$startPattern(.*?)$endPattern"
# Perform the Regex opperation
$result = [regex]::Match($fileContent,$pattern).Value
# Finally return the result to the caller
return $result
}
# Clear the screen
Clear-Host
$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input
基于 Theo 的答案改进的脚本。 以下几点需要改进:
- 尽管我调整了脚本中的缓冲区大小,但输出的开头和结尾以某种方式被修剪。
- 如何将每个匹配结果包装到 START 和 END 字符串中?
- 我仍然不知道如何使用
-Wait
和-Tail
选项
更新的脚本
# Clear the screen
Clear-Host
# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
[console]::bufferwidth = $bw
[console]::bufferheight = $bh
}
else
{
$pshost = get-host
$pswindow = $pshost.ui.rawui
$newsize = $pswindow.buffersize
$newsize.height = $bh
$newsize.width = $bw
$pswindow.buffersize = $newsize
}
function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
# Get content from the input file
$fileContent = Get-Content -Path $filePath -Raw
# Regular expression (Regex) of the given start and end patterns
$pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
# Perform the Regex operation and output
[regex]::Match($fileContent,$pattern).Groups[1].Value
}
# Input file path
$inputFile = "THE-LOG-FILE.log"
# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
I have a text file with a large number of log messages.
I want to extract the messages between two string patterns. I want the extracted message to appear as it is in the text file.
I tried the following methods. It works, but doesn't support Get-Content's -Wait and -Tail options. Also, the extracted results are displayed in one line, but not like the text file. Inputs are welcome :-)
Sample Code
function GetTextBetweenTwoStrings($startPattern, $endPattern, $filePath){
# Get content from the input file
$fileContent = Get-Content $filePath
# Regular expression (Regex) of the given start and end patterns
$pattern = "$startPattern(.*?)$endPattern"
# Perform the Regex opperation
$result = [regex]::Match($fileContent,$pattern).Value
# Finally return the result to the caller
return $result
}
# Clear the screen
Clear-Host
$input = "THE-LOG-FILE.log"
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
# Call the function
GetTextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $input
Improved script based on Theo's answer.
The following points need to be improved:
- The beginning and end of the output is somehow trimmed despite I adjusted the buffer size in the script.
- How to wrap each matched result into START and END string?
- Still I could not figure out how to use the
-Wait
and-Tail
options
Updated Script
# Clear the screen
Clear-Host
# Adjust the buffer size of the window
$bw = 10000
$bh = 300000
if ($host.name -eq 'ConsoleHost') # or -notmatch 'ISE'
{
[console]::bufferwidth = $bw
[console]::bufferheight = $bh
}
else
{
$pshost = get-host
$pswindow = $pshost.ui.rawui
$newsize = $pswindow.buffersize
$newsize.height = $bh
$newsize.width = $bw
$pswindow.buffersize = $newsize
}
function Get-TextBetweenTwoStrings ([string]$startPattern, [string]$endPattern, [string]$filePath){
# Get content from the input file
$fileContent = Get-Content -Path $filePath -Raw
# Regular expression (Regex) of the given start and end patterns
$pattern = '(?is){0}(.*?){1}' -f [regex]::Escape($startPattern), [regex]::Escape($endPattern)
# Perform the Regex operation and output
[regex]::Match($fileContent,$pattern).Groups[1].Value
}
# Input file path
$inputFile = "THE-LOG-FILE.log"
# The patterns
$startPattern = 'START-OF-PATTERN'
$endPattern = 'END-OF-PATTERN'
Get-TextBetweenTwoStrings -startPattern $startPattern -endPattern $endPattern -filePath $inputFile
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要对
Get-Content
调用,在管道中,例如使用ForEach-Object
,如果您希望在读取行时对其进行处理。Get-Content -Wait
,这是必须的,因为这样的调用不会自行终止(它会无限期地等待新行添加到文件中) ),但在管道内,它的输出可以在接收时进行处理,甚至在命令终止之前也是如此。您正在尝试跨多行进行匹配,只有使用
-Raw
开关,Get-Content
输出才有效- 默认情况下,Get-Content
逐行读取其输入文件。-Raw
与-Wait
不兼容。这是概念证明,但请注意以下几点:
-Tail 100
是硬编码的 - 根据需要进行调整或将其设为另一个参数。使用
-Wait
意味着该函数将无限期运行 - 等待新行添加到$filePath
- 因此您需要使用 Ctrl-C 停止它。虽然您可以在管道中使用
Get-TextBetweenTwoStrings
调用自身进行逐个对象处理,将其结果分配给变量 ($result = ...
)在使用Ctrl-C终止时将不起作用,因为这种终止方法也会中止赋值操作。为了解决此限制,下面的函数被定义为 高级功能,自动启用对常见
-OutVariable
参数,即使使用 Ctrl-C 终止,也会填充该参数;您的示例调用将如下所示(正如 Theo 所说,不要使用自动$input
变量作为自定义变量):根据您的反馈,您希望行块包含开始和结束模式匹配的整行,因此下面的正则表达式包含在
.*
中
The
$startPattern
和$endPattern
参数中的单词 pattern 有点不明确,因为它表明它们本身是正则表达式 因此可以使用按原样或按原样嵌入在-match
运算符。但是,在下面的解决方案中,我假设它们被视为文字字符串,这就是为什么它们用
[regex]::Escape()
;如果这些参数本身确实是正则表达式,则只需省略这些调用即可;即:该解决方案假设块之间没有重叠,并且在给定块中,开始和结束模式位于不同的行上。
找到的每个块都输出为单个多行字符串,使用 LF (
"`n"
) 作为换行符;如果您想要 CRLF 换行序列,请使用"`r`n"
;对于平台本机换行符格式(Windows 上为 CRLF,类 Unix 平台上为 LF),请使用[Environment]::NewLine
。You need to perform streaming processing of your
Get-Content
call, in a pipeline, such as withForEach-Object
, if you want to process lines as they're being read.Get-Content -Wait
, because such a call doesn't terminate by itself (it keeps waiting for new lines to be added to the file, indefinitely), but inside a pipeline its output can be processed as it is being received, even before the command terminates.You're trying to match across multiple lines, which with
Get-Content
output would only work if you used the-Raw
switch - by default,Get-Content
reads its input file(s) line by line.-Raw
is incompatible with-Wait
.Here's a proof of concept, but note the following:
-Tail 100
is hard-coded - adjust as needed or make it another parameter.The use of
-Wait
means that the function will run indefinitely - waiting for new lines to be added to$filePath
- so you'll need to use Ctrl-C to stop it.While you can use a
Get-TextBetweenTwoStrings
call itself in a pipeline for object-by-object processing, assigning its result to a variable ($result = ...
) won't work when terminating with Ctrl-C, because this method of termination also aborts the assignment operation.To work around this limitation, the function below is defined as an advanced function, which automatically enables support for the common
-OutVariable
parameter, which is populated even in the event of termination with Ctrl-C; your sample call would then look as follows (as Theo notes, don't use the automatic$input
variable as a custom variable):Per your feedback you want the block of lines to encompass the full lines on which the start and end patterns match, so the regexes below are enclosed in
.*
The word pattern in your
$startPattern
and$endPattern
parameters is a bit ambiguous in that it suggests that they themselves are regexes that can therefore be used as-is or embedded as-is in a larger regex on the RHS of the-match
operator.However, in the solution below I am assuming that they are be treated as literal strings, which is why they are escaped with
[regex]::Escape()
; simply omit these calls if these parameters are indeed regexes themselves; i.e.:The solution assumes there is no overlap between blocks and that, in a given block, the start and end patterns are on separate lines.
Each block found is output as a single, multi-line string, using LF (
"`n"
) as the newline character; if you want a CRLF newline sequences instead, use"`r`n"
; for the platform-native newline format (CRLF on Windows, LF on Unix-like platforms), use[Environment]::NewLine
.首先,你不应该使用
$input
作为自定义变量名,因为这是一个自动变量。然后,您将文件作为字符串数组读取,您更愿意将其读取为单个多行字符串。对于该附加开关
-Raw
到 Get-Content 调用。您创建的正则表达式不允许在您给出的开始和结束模式中使用 fgor regex 特殊字符,因此我建议在创建正则表达式时在这些模式上使用
[regex]::Escape()
细绳。虽然您的正则表达式确实在括号内使用了组捕获序列,但在获取您寻求的值时您并没有使用它。
最后,我建议对函数名称使用 PowerShell 命名约定(动词-名词)
Try
会产生类似以下结果:
The
(?is)
使正则表达式不区分大小写并具有点也匹配换行符很高兴看到您正在使用我的
Get-TextBetweenTwoStrings
函数版本,但是我相信您将控制台中的输出误认为是在专用文本编辑器中输出。在控制台中,太长的行将被截断,而在记事本等文本编辑器中,您可以选择换行长行或使用水平滚动条。如果您只是附加
到
Get-TextBetweenTwoStrings ..
调用,您会发现当您在 Word 或记事本中打开它时,这些行不会被截断。事实上,您可以在该行后面加上
记事本立即打开该文件。
First of all, you should not use
$input
as self-defined variable name, because this is an Automatic variable.Then, you are reading the file as a string array, where you would rather read is as a single, multiline string. For that append switch
-Raw
to the Get-Content call.The regex you are creating does not allow fgor regex special characters in the start- and end patterns you give, so it I would suggest using
[regex]::Escape()
on these patterns when creating the regex string.While your regex does use a group capturing sequence inside the brackets, you are not using that when it comes to getting the value you seek.
Finally, I would recommend using PowerShell naming convention (Verb-Noun) for the function name
Try
Would result in something like:
The
(?is)
makes the regex case-insensitive and have the dot match linebreaks as wellNice to see you're using my version of the
Get-TextBetweenTwoStrings
function, however I believe you are mistaking the output in the console to output as in a dedicated text editor. In the console, too long lines will be truncated, whereas in a text editor like notepad, you can choose to wrap long lines or have a horizontal scrollbar.If you simply append
to the
Get-TextBetweenTwoStrings ..
call, you will find the lines are NOT truncated when you open it in Word or notepad for instance.In fact, you can have that line folowed by
to have notepad open that file straight away.