用开关和正则螺纹 - 基于下一行替换子字符串
每天,我必须从大约十个不同的银行帐户中处理金融系统中的文本文件(MT940)。为了改善银行对帐单的处理,我想丰富这些文件。基本上,通过用其他内容替换银行交易代码(在此示例NMSC中),或者对交易描述(:86:lines)更可读。 该文件可能看起来像:
:20:
:25:MHCBNL2AXXX/0265515777
:28C:27/
:60F:C200207EUR100196,42
:61:2002070207D1326,80NMSCTOPF1450330305SDD TOPF1450330305
:86:FR601810010131670100ARVAL FRANCE BBB200210505254134245678815001141818
:61:2108240824D1976,72NMSCTOPF2140474819//GABK002SCT TOPF2123454819--AA--
:86:ABCWNLXXOBGNL953123450004107181ARIANA GRANDED127139012108000003Gehalt 8/2021--AA--
:61:2108240824D3581,11NMSCTOPF2140474818//GABK006SCT TOPF2140474818--BB--
:86:ABCANLWWXXXNL402011456789701498LADY GAGAD127139012108000002Gehalt 8/2021--BB--
:61:2108240824D3742,44NMSCTOPF2140474817//GABK004SCT TOPF2140474817--CC--
:86:CXWANLWWAT201456787210005293SHEERAN EDD127139012108000001Gehalt 8/2021--CC--
:61:2105250525D3742,44NMSCTOPF2025434704SCT TOPF2025434704
:86:CXWANLWWAT201456787210005293SHEERAN EDGCMS000039851534Salary
:61:2105250525D3581,11NMSCTOPF2025434705SCT TOPF2025434705
:86:ABCANLWWXXXNL402011456789701498LADY GAGAGCMS000039851545Salary
:61:2105250525D1976,72NMSCTOPF2025434706SCT TOPF2025434706
:86:ABCWNLXXOBGNL953123450004107181ARIANA GRANDEGCMS000039851576Salary
:62F:C200207EUR39752,98
我想做以下操作:
- 以:61:并在其中有SDD的行中,我想用RDDT替换NMSC。
- 如果一行以:86:并且其中有薪金或gehalt一词,我想在上一行中替换NMSC(以:61 :)的萨拉(Sala)开头。
我对此进行以下操作:
switch -Regex ($MT940)
{
'^:61:[0-9]{1,6}.+D\d+\,?\d*NMSCTOPF\d+SDD TOPF\d*'
{ $_ -replace 'NMSC' ,'RDDT' }
'^:61:[0-9]{1,6}[0-9]{4}D[0-9]+\,[0-9]?[0-9]?NMSCTOPF\d+.+SCT.+'
{$saved = $_ ; continue}
'^:86:.+salar.+|^:86:.+Gehalt.+'
{$saved -replace "NMSC" , 'SALA' ;$_}
default
{ $_ } # unrelated line, pass through
}
这给出以下结果:
>>:20:
>>:25:XXXXXXXXXXXXXXXXXXXXXXX
>>:28C:27
>>:60F:C200207EUR100000,00
>>:61:2012311231D0000,1RDDTTOPF1234567890SDD TOPF1234567890
>>:86:FR1234567890ARVAL FRANCE
>>:61:2108240824D0000,01SALATOPF2140474819//GABK002SCT TOPF2123454819--AA--
>>:86:ARIANA GRANDED12713901210Gehalt 8/2021--AA--
>>:61:2108240824D0000,01SALATOPF2140474818//GABK006SCT TOPF2140474818--BB--
>>:86:LADY GAGA127139012108000002Gehalt 8/2021--BB--
>>:61:2108240824D0000,01SALATOPF2140474817//GABK004SCT TOPF2140474817--CC--
>>:86:SHEERAN EDD127139012108000001Gehalt 8/2021--CC--
>>:61:2108240824D0000,01NMSCTOPF2025434704SCT TOPF2025434704 AA
**>>:61:2108240824D0000,01SALATOPF2140474817//GABK004SCT TOPF2140474817--CC--**
>>:86:SHEERAN EDGCMS000039851534Salary AA
>>:61:2108240824D0000,01NMSCTOPF2025434705SCT TOPF2025434705 BB
**>>:61:2108240824D0000,01SALATOPF2140474817//GABK004SCT TOPF2140474817--CC--**
>>:86:ABCANLWWXXXNL402011456789701498LADY GAGAGCMS000039851545Salary BB
>>:61:2108240824D0000,01MSCTOPF2025434706SCT TOPF2025434706 CC
**>>:61:2108240824D0000,01SALATOPF2140474817//GABK004SCT TOPF2140474817--CC--**
>>:86:ABCWNLXXOBGNL953123450004107181ARIANA GRANDEGCMS000039851576Salary CC
>>:61:2012311231D0000,1RDDTTOPF0987654321SDD TOPF0987654321
>>:86:FR1234567890ARVAL FRANCE
>>:62F:C200207EUR39752,98
问题:我不明白的是为什么插入**之间标记的界线。
On a daily basis I have to process text files (MT940)in our finance system from about ten different bank accounts. To be able to improve the processing of the bank statements, I want to enrich these files. Basically by replacing the bank transaction code (in this example NMSC) with something else or making to the transaction description (:86: lines) more readable.
The file may look like:
:20:
:25:MHCBNL2AXXX/0265515777
:28C:27/
:60F:C200207EUR100196,42
:61:2002070207D1326,80NMSCTOPF1450330305SDD TOPF1450330305
:86:FR601810010131670100ARVAL FRANCE BBB200210505254134245678815001141818
:61:2108240824D1976,72NMSCTOPF2140474819//GABK002SCT TOPF2123454819--AA--
:86:ABCWNLXXOBGNL953123450004107181ARIANA GRANDED127139012108000003Gehalt 8/2021--AA--
:61:2108240824D3581,11NMSCTOPF2140474818//GABK006SCT TOPF2140474818--BB--
:86:ABCANLWWXXXNL402011456789701498LADY GAGAD127139012108000002Gehalt 8/2021--BB--
:61:2108240824D3742,44NMSCTOPF2140474817//GABK004SCT TOPF2140474817--CC--
:86:CXWANLWWAT201456787210005293SHEERAN EDD127139012108000001Gehalt 8/2021--CC--
:61:2105250525D3742,44NMSCTOPF2025434704SCT TOPF2025434704
:86:CXWANLWWAT201456787210005293SHEERAN EDGCMS000039851534Salary
:61:2105250525D3581,11NMSCTOPF2025434705SCT TOPF2025434705
:86:ABCANLWWXXXNL402011456789701498LADY GAGAGCMS000039851545Salary
:61:2105250525D1976,72NMSCTOPF2025434706SCT TOPF2025434706
:86:ABCWNLXXOBGNL953123450004107181ARIANA GRANDEGCMS000039851576Salary
:62F:C200207EUR39752,98
I want to do the following:
- In lines that start with :61: and have SDD in them, I want to replace NMSC by RDDT.
- If a line that starts with :86: and has the word Salary or Gehalt in them, I want to replace NMSC in the previous line (the one that starts with :61:) with SALA.
I do this with the following:
switch -Regex ($MT940)
{
'^:61:[0-9]{1,6}.+D\d+\,?\d*NMSCTOPF\d+SDD TOPF\d*'
{ $_ -replace 'NMSC' ,'RDDT' }
'^:61:[0-9]{1,6}[0-9]{4}D[0-9]+\,[0-9]?[0-9]?NMSCTOPF\d+.+SCT.+'
{$saved = $_ ; continue}
'^:86:.+salar.+|^:86:.+Gehalt.+'
{$saved -replace "NMSC" , 'SALA' ;$_}
default
{ $_ } # unrelated line, pass through
}
This gives the following result:
>>:20:
>>:25:XXXXXXXXXXXXXXXXXXXXXXX
>>:28C:27
>>:60F:C200207EUR100000,00
>>:61:2012311231D0000,1RDDTTOPF1234567890SDD TOPF1234567890
>>:86:FR1234567890ARVAL FRANCE
>>:61:2108240824D0000,01SALATOPF2140474819//GABK002SCT TOPF2123454819--AA--
>>:86:ARIANA GRANDED12713901210Gehalt 8/2021--AA--
>>:61:2108240824D0000,01SALATOPF2140474818//GABK006SCT TOPF2140474818--BB--
>>:86:LADY GAGA127139012108000002Gehalt 8/2021--BB--
>>:61:2108240824D0000,01SALATOPF2140474817//GABK004SCT TOPF2140474817--CC--
>>:86:SHEERAN EDD127139012108000001Gehalt 8/2021--CC--
>>:61:2108240824D0000,01NMSCTOPF2025434704SCT TOPF2025434704 AA
**>>:61:2108240824D0000,01SALATOPF2140474817//GABK004SCT TOPF2140474817--CC--**
>>:86:SHEERAN EDGCMS000039851534Salary AA
>>:61:2108240824D0000,01NMSCTOPF2025434705SCT TOPF2025434705 BB
**>>:61:2108240824D0000,01SALATOPF2140474817//GABK004SCT TOPF2140474817--CC--**
>>:86:ABCANLWWXXXNL402011456789701498LADY GAGAGCMS000039851545Salary BB
>>:61:2108240824D0000,01MSCTOPF2025434706SCT TOPF2025434706 CC
**>>:61:2108240824D0000,01SALATOPF2140474817//GABK004SCT TOPF2140474817--CC--**
>>:86:ABCWNLXXOBGNL953123450004107181ARIANA GRANDEGCMS000039851576Salary CC
>>:61:2012311231D0000,1RDDTTOPF0987654321SDD TOPF0987654321
>>:86:FR1234567890ARVAL FRANCE
>>:62F:C200207EUR39752,98
Question: What I do not understand is why the lines marked between ** are inserted.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
显然,通过将
$
添加到末尾来解决问题,这表明行的结尾。我不明白为什么,这会导致其他线(前两个:61:行)消失。
Apparently the problem is solved by adding a
$
to the end, indicating the end of the line.I do not understand why and it causes other lines (the first two :61: lines) to disappear.