正则表达式与以下 \.\d 不匹配组

发布于 2024-12-28 06:28:08 字数 713 浏览 4 评论 0原文

我尝试在不同的文本中查找地址。它工作得很好，除了它还匹配一个单词后跟一个日期（foobar 22.01.2012 => 地址：foobar 22）所以我想改进正则表达式，街道号码后面不能跟“(.|:)\d”，

这就是我所拥有的：

(?<str>\b([a-zA-Z]+-*[a-zA-Z]+(-|\s)*([a-zA-Z]|-)+)\b\.?\s{1})(?<no>\d+(\s?[a-zA-Z])?\b)

代表性文本：

咨询时间
星期一，06.02。直至 2012 年 2 月 10 日星期五以及
星期一，13.02。直到 2012 年 2 月 14 日星期二，
每14.00-15.30点，二楼，
Am Fasanengarten 12 foobar
施洛斯大街34

应该找到什么？
法萨嫩花园 12 号
施洛斯大街34

发现了什么？
06
10
13
14
每14个
法萨嫩花园 12 号
foobar // 为什么这是一个匹配？没有号码？
施洛斯大街34

我尝试了不同的正向/负向回顾/前瞻性，但没有运气。

原文

I try to find addresses in different texts. It works quite well except that it also matches a word followed by a date (foobar 22.01.2012 => address: foobar 22)
So I would like to improve the regex in a way that a streetnumber MUST NOT be followed by "(.|:)\d"

This is what I have:

(?<str>\b([a-zA-Z]+-*[a-zA-Z]+(-|\s)*([a-zA-Z]|-)+)\b\.?\s{1})(?<no>\d+(\s?[a-zA-Z])?\b)

A representative text:

Consultation hours
Monday, the 06.02. until Friday, the 10.02.2012 and
Monday, the 13.02. until Tuesday, the 14.02.2012,
each 14.00-15.30 o'clock, second floor,
Am Fasanengarten 12 foobar
Schlossstr. 34

What should be found?
Am Fasanengarten 12
Schlossstr. 34

What is found?
the 06
the 10
the 13
the 14
each 14
Am Fasanengarten 12
foobar // why is this a match? Without number?
Schlossstr. 34

I tried different positive/negative lookbehinds/-aheads but with no luck.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

执手闯天涯 2025-01-04 06:28:08

在这里试试这个

(?<str>\b(?:[a-zA-Z]+-*[a-zA-Z]+(?:[ \t-])*(?:[a-zA-Z]|-)+)\b\.?\s)(?<no>\d+(?:\s?[a-zA-Z])?\b)(?![.:]\d)

在 Regexr 上查看

否定先行 (?![.:]\d)< /code> 最后确保没有“.”。并且前面没有“:”后跟 \d 。

foobar // 为什么这是一个匹配？没有号码？
施洛斯大街34

这是一个匹配，因为您允许在街道名称的单词之间使用 \s

(?<str>\b([a-zA-Z]+-*[a-zA-Z]+(-|\s)*([a-zA-Z]|-)+)\b\.?\s{1})(?<no>\d+(\s?[a-zA-Z])?\b)
                                 ^^ here

我在解决方案中将其替换为 [ \t-]，这仅允许空格、制表符和连字符。

\s 是“空白”，它还包含换行符，因此它与 foobar 匹配，如果您查看该组，您会发现它与地址“匹配”福巴城堡 34"

Try this here

(?<str>\b(?:[a-zA-Z]+-*[a-zA-Z]+(?:[ \t-])*(?:[a-zA-Z]|-)+)\b\.?\s)(?<no>\d+(?:\s?[a-zA-Z])?\b)(?![.:]\d)

See it here on Regexr

The negative lookahead (?![.:]\d) at the end assures, that there is no "." and no ":" followed by \d ahead.

foobar // why is this a match? Without number?
Schlossstr. 34

This is a match because you allow \s between the words of the streetname

(?<str>\b([a-zA-Z]+-*[a-zA-Z]+(-|\s)*([a-zA-Z]|-)+)\b\.?\s{1})(?<no>\d+(\s?[a-zA-Z])?\b)
                                 ^^ here

I replaced this in my solution with [ \t-], this allows only space, Tab and hyphen.

\s is "Whitespace" and this contains also the line brake characters, because of this it matches the foobar, if you would have looked at the group, you would have seen, that it matches the address "foobar Schlossstr. 34"

回复收藏 0 原文

~没有更多了~