带正面的正则态度将匹配多行

发布于 2025-02-13 23:48:43 字数 2098 浏览 0 评论 0原文

我正在努力处理，在某些特定的字符串之后，它将匹配多条线。

Let's say we have a sample data like:

Data1
some changing text
12406943 Old Company New Company reason Something 1/2/2005 10,00
14757152 Old Company 2 New Company 2 Reason 2 Something2 10/7/2007 8,00

Data2
some changing text
12406943 New Company invoice1 31.01.2005 500,00
14757152 New Company 2 invoice2 28.05.2007 1000,00

Earlier I was getting data from Data1 with regex:

(?<caseNumber>\d+) +?(?<temp>.*) +?(?>Something|Something2).*(?<originalDate>\d{1,2}/\d{1,2}/\d{4}) +?(?<interestRate>\d{1,3}\.\d{2}).*

and Data2:

(?<caseNumber>\d+) +?(?<companyName>.*) +?(?<invoiceNumber>\S*) +?\d{2}\.\d{2}\.\d{4}

Unfortunately something changed and the date format for Data1 is the same as for Data2 and regex for Data2 is getting rows from Data1.

Data1
some changing text
12406943 Old Company New Company reason Something 02.01.2005 10,00
14757152 Old Company 2 New Company 2 Reason 2 Something2 07.10.2007 8,00

Data2
some changing text
12406943 New Company invoice1 31.01.2005 500,00
14757152 New Company 2 invoice2 28.05.2007 1000,00

I wanted to use positive lookbehind and check if before all Data2 rows I will find Data2 text, but it only returns the first row from results

(?<=Data2\Rsome changing text\R)(?<caseNumber>\d+) +?(?<companyName>.*) +?(?<invoiceNumber>\S*) +?\d{2}\.\d{2}\.\d{4}

The use-case in java code is that I find the row by regex matcher .find( ）方法，然后在中调用该方法，而循环逐行运行。在上述情况下，它只会返回一排，这不是我想要的。

您是否有任何想法如何定义该正则表达式或在某些文本后启用任何多行搜索数据行？ Maybe that's some novice mistake, but I can't see it for now :)

If I tried to use quantifier, and treat the main data as group it takes the last occurrence only:

(?<=Data2\Rsome changing text\R)((?<caseNumber>\d+) +?(?<companyName>.*) +?(?<invoiceNumber>\S*) +?\d{2}\.\d{2}\.\d{4}.*\R)+

原文

I'm struggling with regex that will match multiple lines after some specific string.

Let's say we have a sample data like:

Data1
some changing text
12406943 Old Company New Company reason Something 1/2/2005 10,00
14757152 Old Company 2 New Company 2 Reason 2 Something2 10/7/2007 8,00

Data2
some changing text
12406943 New Company invoice1 31.01.2005 500,00
14757152 New Company 2 invoice2 28.05.2007 1000,00

Earlier I was getting data from Data1 with regex:

(?<caseNumber>\d+) +?(?<temp>.*) +?(?>Something|Something2).*(?<originalDate>\d{1,2}/\d{1,2}/\d{4}) +?(?<interestRate>\d{1,3}\.\d{2}).*

and Data2:

(?<caseNumber>\d+) +?(?<companyName>.*) +?(?<invoiceNumber>\S*) +?\d{2}\.\d{2}\.\d{4}

Unfortunately something changed and the date format for Data1 is the same as for Data2 and regex for Data2 is getting rows from Data1.

Data1
some changing text
12406943 Old Company New Company reason Something 02.01.2005 10,00
14757152 Old Company 2 New Company 2 Reason 2 Something2 07.10.2007 8,00

Data2
some changing text
12406943 New Company invoice1 31.01.2005 500,00
14757152 New Company 2 invoice2 28.05.2007 1000,00

I wanted to use positive lookbehind and check if before all Data2 rows I will find Data2 text, but it only returns the first row from results

(?<=Data2\Rsome changing text\R)(?<caseNumber>\d+) +?(?<companyName>.*) +?(?<invoiceNumber>\S*) +?\d{2}\.\d{2}\.\d{4}

The use-case in java code is that I find the row by regex matcher .find() method and then call that method in a while loop to run row by row. In above situation it will only return one row and that's not what I want.

Do you maybe have any idea how to define that regex or enable any multiline searching for data row after some text?
Maybe that's some novice mistake, but I can't see it for now :)

If I tried to use quantifier, and treat the main data as group it takes the last occurrence only:

(?<=Data2\Rsome changing text\R)((?<caseNumber>\d+) +?(?<companyName>.*) +?(?<invoiceNumber>\S*) +?\d{2}\.\d{2}\.\d{4}.*\R)+

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

握住你手 2025-02-20 23:48:43

在Java中，您可以使用\ g锚点，而不是使用LookBehind断言。

Explanation

(?:^Data2\Rsome changing text|\G(?!^))\R(?<caseNumber>\d+)\h+(?<companyName>\S.*?)\h+(?<invoiceNumber>\S+)\h+\d{2}\.\d{2}\.\d{4}\b.*

(?: Non capture group
- ^data2 \ rsome从字符串开始时更改文本，匹配文本，然后是newlines
- |或
- \ g（？！^）断言上一场比赛结束时的当前位置
)\R Close non capture group and match a unicode newline sequence
(?\d+)\h+ Capture 1+ digits and 1+ spaces
(?\S.*?)\h+ Capture a single非白色的char，然后是任何字符（最少）和1多个空间
(?\S+)\h+ Capture 1+ non whitespace chars and 1+ spaces
\d{2}\.\d{2}\.\d{4}\b Match a date like pattern in the data
.* Match the rest of the line

See a Regex Demo 。

In Java, you can make use of the \G anchor instead of using a lookbehind assertion.

Explanation

(?:^Data2\Rsome changing text|\G(?!^))\R(?<caseNumber>\d+)\h+(?<companyName>\S.*?)\h+(?<invoiceNumber>\S+)\h+\d{2}\.\d{2}\.\d{4}\b.*

(?: Non capture group
- ^Data2\Rsome changing text From the start of the string, match the text literally followed by newlines
- | Or
- \G(?!^) Assert the current position at the end of the previous match, not at the start
)\R Close non capture group and match a unicode newline sequence
(?<caseNumber>\d+)\h+ Capture 1+ digits and 1+ spaces
(?<companyName>\S.*?)\h+ Capture a single non whitespace char followed by any chars (as least as possible) and 1+ spaces
(?<invoiceNumber>\S+)\h+ Capture 1+ non whitespace chars and 1+ spaces
\d{2}\.\d{2}\.\d{4}\b Match a date like pattern in the data
.* Match the rest of the line