使用 preg_match 解析表达式
我正在尝试使用 preg_match 解析以下内容:
2020|9 digits number|date hour|word|word
作为示例:
2020|123456789|01/04/2011 09:09:37|Basketball|sms
我正在做:
$regex = '2020|/[0-9]+\|[a-zA-Z]+\|[0-9]{2}\/[0-9]{2}\/[0-9]{4}.*/';
return !(preg_match($regex,$value));
但我收到错误 Delimiter must not be alphanumeric or backslash
,而且我没有得到什接近它。
你能帮我一下吗?
I'm trying to parse the following by using preg_match:
2020|9 digits number|date hour|word|word
As an example:
2020|123456789|01/04/2011 09:09:37|Basketball|sms
I'm doing:
$regex = '2020|/[0-9]+\|[a-zA-Z]+\|[0-9]{2}\/[0-9]{2}\/[0-9]{4}.*/';
return !(preg_match($regex,$value));
But I'm getting the error Delimiter must not be alphanumeric or backslash
, and I'm not getting even close to it.
Can you please give me a hand?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果
|
是您的分隔符,并且数据始终按照您描述的方式构建,为什么不使用explode()
呢?为了使其可靠地工作,任何列都不能包含“|”作为内容角色。但使用正则表达式也会有这样的限制。
如果您正在解析这样构建的整个文件,请查看
fgetcsv ()
。If
|
is your separator, and the data is always structured the way you describe, why not useexplode()
instead?For this to work reliably, none of the columns must contain "|" as a content character. But you'd have that restriction with a regex, too.
If you're parsing whole files built like this, take a look at
fgetcsv()
.您的正则表达式有一些问题
|
。/
移至正则表达式的开头。/
是一个分隔符,标记正则表达式的开始和结束。[a-zA-Z]+
,因为它与您的定义中没有的单词相匹配。这应该可行:
您还可以使用
#
作为分隔符,以避免需要转义文字/
。它也不像您对字符串的定义那么严格。我建议进行以下改进:
[0-9]{9}
精确匹配 9 位数字,而不是 1+。[0-9]{2}:[0-9]{2}:[0-9]{2}
匹配。\w+\|\w+
匹配。^
和$
锚点以强制匹配完整字符串。将所有这些放在一起,我们可以看到:
在 rubular 上查看它。
Your regex has a few problems
|
./
to the beginning of the regex. The/
is a delimiter that marks the beginning and end of a regex.[a-zA-Z]+
as that matches a word where your definition doesn't have one.This should work:
You could also use
#
as your delimiter to avoid the need to escape the literal/
s.It is also not as strict as your definition of what the string should look like. I suggest making the following improvements:
[0-9]{9}
.[0-9]{2}:[0-9]{2}:[0-9]{2}
.\w+\|\w+
.^
and$
anchors to force a match of the full string.Putting that all together gives us:
See it on rubular.
Perl 兼容的正则表达式必须以分隔符开头和结尾(下面为
%
)。您的 RE 以“2”开头,PCRE 将其解释为分隔符,因此出现“分隔符不能是字母数字或反斜杠”错误。我用来检查“2020|9 位数字|日期时间|单词|单词”的表达式是
%^2020\|\d{9}\|\d{2}[-/]\d {2}[-/]\d{4}\d{2}:\d{2}:\d{2}\|\w+\|\w+$%
。除了日期之外,与字段匹配的 RE 非常简单:预定义的类(\d
表示数字,相当于[0-9]
;\w
对于单词,相当于[A-Za-z0-9_]
)和重复({n}
表示恰好 n ,+
表示 1 或更多)。日期由
\d{2}[-/]\d{2}[-/]\d{4} \d{2}:\d{2}:\d{2}.它使用与其他子模式相同的元素,只是有更多的元素。如果您想匹配更多日期格式,您要么需要编写更复杂的 RE,要么提取日期并使用(例如)
strtotime
来解析它。如果您希望解析整个字符串,而不是简单地检查它,请遵循 Pekka 的建议。
Perl compatible regular expressions must start and end with a delimiter (below,
%
). Your RE begins with "2", which PCRE interprets as a delimiter, hence the "Delimiter must not be alphanumeric or backslash" error.The expression I'd start with to check "2020|9 digits number|date hour|word|word" is
%^2020\|\d{9}\|\d{2}[-/]\d{2}[-/]\d{4} \d{2}:\d{2}:\d{2}\|\w+\|\w+$%
. Other than the date, the REs matching the fields are very simple: a predefined class (\d
for digits, equivalent to[0-9]
;\w
for words, equivalent to[A-Za-z0-9_]
) and a repetition ({n}
means exactly n,+
means 1 or more).The date is matched by
\d{2}[-/]\d{2}[-/]\d{4} \d{2}:\d{2}:\d{2}
. This uses the same elements as the other subpatterns, just has more of them. If you want to match more date formats, you'll either need to write a more complex RE, or extract the date and use (e.g.)strtotime
to parse it.If you wish to parse the whole string, rather than simply check it, follow Pekka's advice.