如何将特定元素从巨大的TXT文件提取和对CSV进行分类?

发布于 2025-01-22 02:32:32 字数 2891 浏览 4 评论 0 原文

我正在尝试创建一个PowerShell脚本,以提取所有包含“错误”的行及其数据库路径到该项目中的数据库路径中的巨大日志TXT文件,然后将其整理到CSV文件中。 错误的示例:

2022-04-17 00:00:00.9999|ERROR|texte:texte|texte \\DATABASE\Path\Path\Path\Path\Item[Item Name] (ID:########-####-####-###-############ Rank:#). description of the error. 

然后,我想恢复错误中的日期和完整路径(\ database \ path \ path \ path \ path \ path \ path \ path \ item \ item [item name])以及错误的描述并删除重复。 另外,我不知道是否可以直接将CSV文件中三列中的日期,路径和消息分开。

logs(screenshot) example :

2022-04-17 00:00:00.9999|ERROR|ANDataCache:Configuration|################# Error when adding input attributes to data cache (Failed:8/Total:12) [99.9999999999999 ms].
2022-04-17 00:00:00.9999|ERROR|ANCalculationEngine:Configuration|Failed to initialize \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name]  (ID:########-####-####-###-############ Rank:#). Failed to resolve required input 'input A name'
Failed to resolve required input 'input B name'
No output is defined.
2022-04-17 00:00:00.9999|WARN|ANTimeClassManagerHelper:Configuration|Ignoring partial cache signup errors for \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name]  (ID:########-####-####-###-############ Rank:#). Failed to signup some input(s) for receiving updates. 
 Net Volume in Tank: Point not found 'Point Name'.
2022-04-17 00:00:00.9999|ERROR|ANCalculationEngine:Configuration|Failed to initialize \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] (ID:########-####-####-###-############ Rank:#). Failed to resolve required input 'input name'
There is no time rule configured for this analysis.
No output is defined.
2022-04-17 00:00:00.9999|WARN|###########:#########|############[#####] Ignoring attempt to remove non-existent calculation '\\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] (ID:########-####-####-###-############ Rank:#)'
2022-04-17 00:00:00.9999|ERROR|ANDataCache:Configuration|DataCache:################ Error when adding input attributes to data cache (Failed:8/Total:12) [99.9999999999999 ms]. 

Example of expected result (according to the example above)

(I just want to retrieve ERRORS with path ("\DATABASE\Path\Path\Path\Path\ 项目[item name]”),而不是警告日志或没有路径的错误)

我开始写这篇文章的

$File = "logs.txt"
$Pattern = '(\[ERROR\[^\\]+(?<DatabasePath>[^\\]]+\])(?<ErrorText>[^\r\n]+=)'
$Content = Get-Content $File
[regex]::Matches($Content, $Pattern).Value | Set-Content "output.csv" 

:或只是检索路径:

$File = "logs.txt"
$Pattern = '(?<=\\DATABASE\\).+?(?=])'
$Content = Get-Content $File
[regex]::Matches($Content, $Pattern).Value | Set-Content "output.csv"

但是在第二种情况下,“数据库”不会出现在输出文件中。

预先感谢您的回答。

I'm trying to create a powershell script to extract all lines containing "ERROR" and its database path to the item into a huge logs txt file and sort it into a csv file.
Example of error :

2022-04-17 00:00:00.9999|ERROR|texte:texte|texte \\DATABASE\Path\Path\Path\Path\Item[Item Name] (ID:########-####-####-###-############ Rank:#). description of the error. 

I would then like to recover the date and the full path to the element in error (\DATABASE\Path\Path\Path\Path\Item[Item Name]) as well as the description of the error and delete the duplicates.
Also I don't know if it is possible to directly separate the date, the path and the message in three columns in the csv file.

Example of logs (screenshot) :

2022-04-17 00:00:00.9999|ERROR|ANDataCache:Configuration|################# Error when adding input attributes to data cache (Failed:8/Total:12) [99.9999999999999 ms].
2022-04-17 00:00:00.9999|ERROR|ANCalculationEngine:Configuration|Failed to initialize \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name]  (ID:########-####-####-###-############ Rank:#). Failed to resolve required input 'input A name'
Failed to resolve required input 'input B name'
No output is defined.
2022-04-17 00:00:00.9999|WARN|ANTimeClassManagerHelper:Configuration|Ignoring partial cache signup errors for \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name]  (ID:########-####-####-###-############ Rank:#). Failed to signup some input(s) for receiving updates. 
 Net Volume in Tank: Point not found 'Point Name'.
2022-04-17 00:00:00.9999|ERROR|ANCalculationEngine:Configuration|Failed to initialize \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] (ID:########-####-####-###-############ Rank:#). Failed to resolve required input 'input name'
There is no time rule configured for this analysis.
No output is defined.
2022-04-17 00:00:00.9999|WARN|###########:#########|############[#####] Ignoring attempt to remove non-existent calculation '\\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] (ID:########-####-####-###-############ Rank:#)'
2022-04-17 00:00:00.9999|ERROR|ANDataCache:Configuration|DataCache:################ Error when adding input attributes to data cache (Failed:8/Total:12) [99.9999999999999 ms]. 

Example of expected result (according to the example above)

(I just want to retrieve ERRORS with path ("\DATABASE\Path\Path\Path\Path\Item[Item Name]"), not the WARNINGS logs or the ERRORS without path)

I started writing this:

$File = "logs.txt"
$Pattern = '(\[ERROR\[^\\]+(?<DatabasePath>[^\\]]+\])(?<ErrorText>[^\r\n]+=)'
$Content = Get-Content $File
[regex]::Matches($Content, $Pattern).Value | Set-Content "output.csv" 

Or to just retrieve the path :

$File = "logs.txt"
$Pattern = '(?<=\\DATABASE\\).+?(?=])'
$Content = Get-Content $File
[regex]::Matches($Content, $Pattern).Value | Set-Content "output.csv"

But in the second case "DATABASE" does not appear in the output file.

Thank you in advance for your answers.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小忆控 2025-01-29 02:32:32

正则表达式可能会得到改善,但暂时这可能会帮助您获得所需的东西。我鼓励您检查此 regex101链接测试当前的regex(如果是这样,也可以改进它)有一些不起作用的东西。

$re = [regex]"(?m)(?<date>^[\d-]+\s[\d:.]+)\|ERROR\|.*?(?<path>\\[\\\w\s\[.\]]+).*?\.(?<description>[\w\s'\r?\n.]+$)"
& {
    $content = Get-Content $File -Raw
    foreach($match in $re.Matches($content)) {
        $date, $path, $description = $match.Groups['date','path','description']
        [pscustomobject]@{
            Date = $date.Value -as [datetime]
            Path = $path.Value.Trim()
            Description = ($description.Value -replace '\r?\n', ' ').Trim()
        }
    }
} | Export-Csv "output.csv" -NoTypeInformation

我使用问题中提供的示例数据的输出看起来像这样,可以将其导出为适当的CSV:

PS /> $output | Format-Table

Date                  Path                                                  Description
----                  ----                                                  -----------
4/17/2022 12:00:00 AM \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] Failed to resolve required input 'input A name'. F… 
4/17/2022 12:00:00 AM \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] Failed to resolve required input 'input name'. The…

PS /> $output[0].Description

Failed to resolve required input 'input A name' Failed to resolve required input 'input B name' No output is defined.

您可以删除 -AS [dateTime] ,如果您想在日期格式时保持日期格式目前将其放在您的文件中。

The regex is likely to be improved but for the time being this might help you get what you're looking for. I encourage you to check this regex101 link to test the current regex (and maybe improve it) if there is something not working.

$re = [regex]"(?m)(?<date>^[\d-]+\s[\d:.]+)\|ERROR\|.*?(?<path>\\[\\\w\s\[.\]]+).*?\.(?<description>[\w\s'\r?\n.]+$)"
& {
    $content = Get-Content $File -Raw
    foreach($match in $re.Matches($content)) {
        $date, $path, $description = $match.Groups['date','path','description']
        [pscustomobject]@{
            Date = $date.Value -as [datetime]
            Path = $path.Value.Trim()
            Description = ($description.Value -replace '\r?\n', ' ').Trim()
        }
    }
} | Export-Csv "output.csv" -NoTypeInformation

The output I got using the sample data provided in the question looks like this, which can be exported as a proper CSV:

PS /> $output | Format-Table

Date                  Path                                                  Description
----                  ----                                                  -----------
4/17/2022 12:00:00 AM \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] Failed to resolve required input 'input A name'. F… 
4/17/2022 12:00:00 AM \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] Failed to resolve required input 'input name'. The…

PS /> $output[0].Description

Failed to resolve required input 'input A name' Failed to resolve required input 'input B name' No output is defined.

You can remove -as [datetime] if you want to keep the date format as you currently have it in your file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文