正则表达式问题：无法匹配可变长度模式

发布于 2024-09-28 09:39:40 字数 944 浏览 2 评论 0原文

我的正则表达式有问题，使用 preg_match_all() 来匹配可变长度的内容。

我试图匹配的是“拥塞”一词之后的交通状况我想出的是这个正则表达式模式：

Congestion\s*:\s*(?P.*)

然而，它会提取第一个实例一直到整个主题的末尾，因为 .* 将匹配所有内容。但这不是我想要的，我希望它作为 3 个实例分别匹配。

现在，由于“拥塞”背后的单词可能具有可变长度，因此我无法真正预测之间有多少单词和空格才能得出更严格的 \w*\s*\w* 匹配等。

有关如何进行的任何线索从这里开始？

Highway : Highway 26
Datetime : 18-Oct-2010 05:18 PM
Congestion : Traffic is slow from Smith St to Alice Springs St

Highway : Princes Highway
Datetime : 18-Oct-2010 05:18 PM
Congestion : Traffic is slow at the Flinders St / Elizabeth St intersection

Highway : Eastern Freeway
Datetime : 18-Oct-2010 05:19 PM
Congestion : Traffic is slow from Prince St to Queen St

为了清晰而编辑

这些格式非常好的文本实际上是通过格式非常糟糕的 html 电子邮件收到的。它到处包含随机换行符，例如“拥堵：从 Prince\nSt 到 Queen St 的交通\n 很慢”。

因此，在处理电子邮件时，我剥离了所有 html 代码和随机换行符，并将它们 json_encode() 成一个非常长的单行字符串，没有换行符......

原文

I have a problem with regex, using preg_match_all(), to match something of a variable length.

What I am trying to match is the traffic condition after the word 'Congestion' What I came up with is this regex pattern:

Congestion\s*:\s*(?P<congestion>.*)

It would however, extract the first instance all the way to the end of the entire subject, since .* would match everything. But that's not what I want though, I would like it to match separately as 3 instances.

Now since the words behind Congestion could be of variable length, I can't really predict how many words and spaces are in between to come up with a stricter \w*\s*\w* match etc.

Any clues on how I can proceed from here?

Highway : Highway 26
Datetime : 18-Oct-2010 05:18 PM
Congestion : Traffic is slow from Smith St to Alice Springs St

Highway : Princes Highway
Datetime : 18-Oct-2010 05:18 PM
Congestion : Traffic is slow at the Flinders St / Elizabeth St intersection

Highway : Eastern Freeway
Datetime : 18-Oct-2010 05:19 PM
Congestion : Traffic is slow from Prince St to Queen St

EDIT FOR CLARITY

These very nicely formatted texts here, are actually received via a very poorly formatted html email. It contains random line breaks here and there eg "Congestion : Traffic\n is slow from Prince\nSt to Queen St".

So while processing the emails, I stripped off all the html codes and the random line breaks, and json_encode() them into one very long single-line string with no line break...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

余罪 2024-10-05 09:39:40

通常，正则表达式匹配是基于行的。正则表达式假定您的字符串是单行。您可以使用 “m” ( PCRE_MULTILINE）标志来更改该行为。然后你可以告诉 PHP 只匹配行尾：

preg_match('/^Congestion\s*:\s*(?P<congestion>.*)$/m', $subject, $matches);

有两件事需要注意：首先，模式被修改为包括行开始（^）和行结束（$) 标记。其次，该模式现在带有 m 修饰符。

Usually, regex matching is line-based. Regex assumes that your string is a single line. You can use the “m” (PCRE_MULTILINE) flag to change that behaviour. Then you can tell PHP to match only to the end of the line:

preg_match('/^Congestion\s*:\s*(?P<congestion>.*)$/m', $subject, $matches);

There are two things to notice: first, the pattern was modified to include line-begin (^) and line-end ($) markers. Secondly, the pattern now carries the m modifier.

回复收藏 0 原文

意中人 2024-10-05 09:39:40

您可以尝试最小匹配：

Congestion\s*:\s*(?P.*?)

这将导致在命名组“congestion”中返回零个字符，除非您可以匹配紧接在拥塞字符串之后的东西。

因此，如果“高速公路”始终启动交通状况记录，则可以修复此问题：

Congestion\s*:\s*(?P.*?)Highway\s*:

如果有效（我没有检查过），那么第一条记录匹配，但最后一条记录不匹配！通过在输入字符串末尾附加文本“Highway：”可以轻松解决此问题。

回复收藏 0 原文

喵星人汪星人 2024-10-05 09:39:40

Congestion\s*:\s*Traffic is\s*(?P<c1>[^\n]*)\s*from\s*(?P<c2>[^\n]*)\s*to\s*(?P<c3>[^\n]*)$

Congestion\s*:\s*Traffic is\s*(?P<c1>[^\n]*)\s*from\s*(?P<c2>[^\n]*)\s*to\s*(?P<c3>[^\n]*)$

回复收藏 0 原文

~没有更多了~

关于作者

单身情人

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

正则表达式问题：无法匹配可变长度模式

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

正则表达式问题：无法匹配可变长度模式

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。