使用 preg_match 解析表达式

发布于 2024-10-10 09:53:44 字数 443 浏览 9 评论 0原文

我正在尝试使用 preg_match 解析以下内容：

2020|9 digits number|date hour|word|word

作为示例：

2020|123456789|01/04/2011 09:09:37|Basketball|sms

我正在做：

$regex  = '2020|/[0-9]+\|[a-zA-Z]+\|[0-9]{2}\/[0-9]{2}\/[0-9]{4}.*/';
return !(preg_match($regex,$value));

但我收到错误 Delimiter must not be alphanumeric or backslash，而且我没有得到什接近它。

你能帮我一下吗？

原文

I'm trying to parse the following by using preg_match:

2020|9 digits number|date hour|word|word

As an example:

2020|123456789|01/04/2011 09:09:37|Basketball|sms

I'm doing:

$regex  = '2020|/[0-9]+\|[a-zA-Z]+\|[0-9]{2}\/[0-9]{2}\/[0-9]{4}.*/';
return !(preg_match($regex,$value));

But I'm getting the error Delimiter must not be alphanumeric or backslash, and I'm not getting even close to it.

Can you please give me a hand?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别想她 2024-10-17 09:53:44

如果 | 是您的分隔符，并且数据始终按照您描述的方式构建，为什么不使用 explode() 呢？

$array = explode ("|", $value);
echo $array[0]; // Will output "2020"
echo $array[1]; // Will output "123456789"

为了使其可靠地工作，任何列都不能包含“|”作为内容角色。但使用正则表达式也会有这样的限制。

如果您正在解析这样构建的整个文件，请查看 fgetcsv ()。

If | is your separator, and the data is always structured the way you describe, why not use explode() instead?

$array = explode ("|", $value);
echo $array[0]; // Will output "2020"
echo $array[1]; // Will output "123456789"

For this to work reliably, none of the columns must contain "|" as a content character. But you'd have that restriction with a regex, too.

If you're parsing whole files built like this, take a look at fgetcsv().

回复收藏 0 原文

箹锭⒈辈孓 2024-10-17 09:53:44

您的正则表达式有一些问题

转义第一个 |。
将第一个 / 移至正则表达式的开头。 / 是一个分隔符，标记正则表达式的开始和结束。
删除 [a-zA-Z]+，因为它与您的定义中没有的单词相匹配。

这应该可行：

$regex  = '/2020\|[0-9]+\|[0-9]{2}\/[0-9]{2}\/[0-9]{4}.*/';
return !(preg_match($regex,$value));

您还可以使用 # 作为分隔符，以避免需要转义文字 /。

$regex  = '#2020\|[0-9]+\|[0-9]{2}/[0-9]{2}/[0-9]{4}.*#';

它也不像您对字符串的定义那么严格。我建议进行以下改进：

使用 [0-9]{9} 精确匹配 9 位数字，而不是 1+。
将时间戳与 [0-9]{2}:[0-9]{2}:[0-9]{2} 匹配。
将最后两个单词与 \w+\|\w+ 匹配。
添加 ^ 和 $ 锚点以强制匹配完整字符串。

将所有这些放在一起，我们可以看到：

$regex  = '#^2020\|[0-9]{9}\|[0-9]{2}/[0-9]{2}/[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}\|\w+\|\w+$#';

在 rubular 上查看它。

Your regex has a few problems

Escape the first |.
Move the first / to the beginning of the regex. The / is a delimiter that marks the beginning and end of a regex.
Remove the [a-zA-Z]+ as that matches a word where your definition doesn't have one.

This should work:

$regex  = '/2020\|[0-9]+\|[0-9]{2}\/[0-9]{2}\/[0-9]{4}.*/';
return !(preg_match($regex,$value));

You could also use # as your delimiter to avoid the need to escape the literal /s.

$regex  = '#2020\|[0-9]+\|[0-9]{2}/[0-9]{2}/[0-9]{4}.*#';

It is also not as strict as your definition of what the string should look like. I suggest making the following improvements:

Match exactly 9 digits, not 1+, by using [0-9]{9}.
Match the timestamp with [0-9]{2}:[0-9]{2}:[0-9]{2}.
Match the last two words with \w+\|\w+.
Add ^ and $ anchors to force a match of the full string.

Putting that all together gives us:

$regex  = '#^2020\|[0-9]{9}\|[0-9]{2}/[0-9]{2}/[0-9]{4}\s[0-9]{2}:[0-9]{2}:[0-9]{2}\|\w+\|\w+$#';

See it on rubular.

回复收藏 0 原文

浪菊怪哟 2024-10-17 09:53:44

Perl 兼容的正则表达式必须以分隔符开头和结尾（下面为 %）。您的 RE 以“2”开头，PCRE 将其解释为分隔符，因此出现“分隔符不能是字母数字或反斜杠”错误。

我用来检查“2020|9 位数字|日期时间|单词|单词”的表达式是 %^2020\|\d{9}\|\d{2}[-/]\d {2}[-/]\d{4}\d{2}:\d{2}:\d{2}\|\w+\|\w+$%。除了日期之外，与字段匹配的 RE 非常简单：预定义的类（\d 表示数字，相当于 [0-9]；\w 对于单词，相当于 [A-Za-z0-9_]）和重复（{n} 表示恰好 n , + 表示 1 或更多）。

日期由 \d{2}[-/]\d{2}[-/]\d{4} \d{2}:\d{2}:\d{2}.它使用与其他子模式相同的元素，只是有更多的元素。如果您想匹配更多日期格式，您要么需要编写更复杂的 RE，要么提取日期并使用（例如）strtotime 来解析它。

如果您希望解析整个字符串，而不是简单地检查它，请遵循 Pekka 的建议。