当前位置：文江博客话题详情

PHP ereg 与 preg

发布于 2024-08-03 17:38:54 字数 108 浏览 5 评论 0原文

我注意到 PHP 正则表达式库中有 ereg 和 preg 之间的选择。有什么区别？其中一个比另一个更快吗？如果是，为什么较慢的那个不被弃用？

在某些情况下，使用其中一种比另一种更好吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

极致的悲 2024-08-10 17:38:59

好吧，ereg 及其派生函数（ereg_match 等）在 php5 中已被弃用，并在 php6 中被删除，因此您可能最好使用 preg 系列。

preg 用于 Perl 风格的正则表达式，而 ereg 是标准 POSIX 正则表达式。

回复收藏 0 原文

夜空下最亮的亮点 2024-08-10 17:38:58

preg 是 Perl 兼容的正则表达式库
ereg 是 POSIX 兼容的正则表达式库，

它们的语法略有不同，并且 preg 在某些情况下稍快一些。 ereg 已被弃用（并且在 php6 中被删除），因此我不建议使用它。

回复收藏 0 原文

夜声 2024-08-10 17:38:58

关于哪个更快更好的讨论很多。

如果您计划有一天升级到 PHP6，您就已经做出了决定。否则：

普遍的共识是 PCRE 是更好的全面解决方案，但如果您有一个流量很大的特定页面，并且不需要 PHP6，那么可能值得进行一些测试。
例如，来自PHP手册注释：

在 PHP 中弃用 POSIX 正则表达式
Perl 搜索就像替换
房子的木板和砖
有预制的房间和墙壁。
当然，您可以混合搭配
一些部分，但很多
更容易修改所有部件
摆在你面前。
PCRE 比 POSIX RE 更快？并非总是如此。
在最近的一个搜索引擎项目中
在 Cynergi，我有一个简单的循环
一些可爱的 ereg_replace() 函数
处理数据花了3分钟。我改变了
将 10 行循环变成 100 行
手写代码进行替换和
循环现在需要 10 秒来处理
相同的数据！这让我大开眼界
在某些情况下会非常慢吗
正则表达式。最近我决定
研究 Perl 兼容的正则
表达式（PCRE）。大多数页面声称
PCRE 比 POSIX 更快，但也有一些
另有主张。我决定
我自己的基准。我的前几个
测试证实 PCRE 更快，
但是……结果有点
与其他人得到的不同，所以
我决定对每个案例进行基准测试
我在 8000 线安全上使用 RE
（而且速度很快）网络邮件项目位于
Cynergi 来检查一下。结果？
尚无定论！有时PCRE是
更快（有时更快
比 100 倍快！），但还有其他一些
POSIX RE 的速度更快（一个因子
2x）。我仍然需要找到一个规则
什么时候其中一个更快。它是
不仅与搜索数据大小有关，
匹配的数据量，或“RE
编译时间”这会显示
当您经常重复该功能时：
一个总是比
其他。但我没有找到模式
这里。但说实话，我也没有
花时间研究源码
编码并分析问题。我可以
不过，给你一些例子。这
POSIX RE
([0-9]{4})/([0-9]{2})/([0-9]{2})[^0-9]+
([0-9]{2}):([0-9]{2}):([0-9]{2}) 是
在 POSIX 中比以前快 30%
转换为 PCRE （即使您使用 \d
和 \D 和非贪婪匹配）。在
另一方面，类似的 PCRE
复杂模式 /[0-9]{1,2}[
\t]+[a-zA-Z]{3}[ \t]+[0-9]{4}[
\t]+[0-9]{1,2}:[0-9]{1,2}(:[0-9]{1,2})?[
\t]+[+-][0-9]{4}/ 速度快 2.5 倍
PCRE 比 POSIX RE 中的要好。简单的
替换模式如
ereg_replace( "[^a-zA-Z0-9-]+", "", $m
）； POSIX RE 中的速度比
PCRE。然后我们又陷入困惑
因为 POSIX RE 模式就像
(^|\n|\r)begin-base64[ \t]+[0-7]{3,4}[
\t]+...... 比 POSIX RE 快 2 倍，
但 PCRE 不区分大小写
/^收到[ \t]*:[ \t]由[ \t]+([^
\t]+)[ \t]/i 比它快 30 倍
POSIX RE 版本！当谈到
区分大小写，PCRE 到目前为止
似乎是最好的选择。但我
发现一些非常奇怪的行为
来自埃雷格/埃雷吉。就一个非常简单的
POSIX RE (^|\r|\n)mime-版本[ \t]:
我发现 eregi() 花了 3.60 秒（只是
测试基准中的数字），而
相应的PCRE花费了0.16s！但如果
我使用了 ereg() （区分大小写）
POSIX RE 时间降至 0.08 秒！所以我
进一步调查。我试着做
POSIX RE 本身不区分大小写。
我到目前为止：
(^|\r|\n)[mM][iI][mM][eE]-vers[iI][oO][nN][
\t]*：这个版本也用了0.08s。
但如果我尝试将相同的规则应用于
'v'、'e'、'r' 或 's' 中的任何一个
没有改变的字母，时间
又回到了 3.60 秒大关，并且没有
逐渐地，但立即如此！这
测试数据中没有任何“vers”
它、其中的其他“哑剧”词或任何
“离子”可能会令人困惑
POSIX解析器，所以我不知所措。底部
行：始终对您的 PCRE 进行基准测试 /
找到POSIX RE最快了！测试
是在 PHP 5.1.2 下执行的
Windows，从命令行。佩德罗
Freire cynergi.com

There is much discussion about which is faster and better.

If you plan on someday advancing to PHP6 your decision is made. Otherwise:

The general consensus is that PCRE is the better all around solution, but if you have a specific page with a lot of traffic, and you don't need PHP6 it may be worth some testing.
For example, from the PHP manual comments:

Deprecating POSIX regex in PHP for
Perl searching is like substituting
wooden boards and brick for a house
with pre-fabricated rooms and walls.
Sure, you may be able to mix and match
some of the parts but it's a lot
easier to modify with all the pieces
laid out in front of you.
PCRE faster than POSIX RE? Not always.
In a recent search-engine project here
at Cynergi, I had a simple loop with a
few cute ereg_replace() functions that
took 3min to process data. I changed
that 10-line loop into a 100-line
hand-written code for replacement and
the loop now took 10s to process the
same data! This opened my eye to what
can IN SOME CASES be very slow
regular expressions. Lately I decided
to look into Perl-compatible regular
expressions (PCRE). Most pages claim
PCRE are faster than POSIX, but a few
claim otherwise. I decided on
bechmarks of my own. My first few
tests confirmed PCRE to be faster,
but... the results were slightly
different than others were getting, so
I decided to benchmark every case of
RE usage I had on a 8000-line secure
(and fast) Webmail project here at
Cynergi to check it out. The results?
Inconclusive! Sometimes PCRE are
faster (sometimes by a factor greater
than 100x faster!), but some other
times POSIX RE are faster (by a factor
of 2x). I still have to find a rule on
when are one or the other faster. It's
not only about search data size,
amount of data matched, or "RE
compilation time" which would show
when you repeated the function often:
one would always be faster than the
other. But I didn't find a pattern
here. But truth be said, I also didn't
take the time to look into the source
code and analyse the problem. I can
give you some examples, though. The
POSIX RE
([0-9]{4})/([0-9]{2})/([0-9]{2})[^0-9]+
([0-9]{2}):([0-9]{2}):([0-9]{2}) is
30% faster in POSIX than when
converted to PCRE (even if you use \d
and \D and non-greedy matching). On
the other hand, a similarly PCRE
complex pattern /[0-9]{1,2}[
\t]+[a-zA-Z]{3}[ \t]+[0-9]{4}[
\t]+[0-9]{1,2}:[0-9]{1,2}(:[0-9]{1,2})?[
\t]+[+-][0-9]{4}/ is 2.5x faster in
PCRE than in POSIX RE. Simple
replacement patterns like
ereg_replace( "[^a-zA-Z0-9-]+", "", $m
); are 2x faster in POSIX RE than
PCRE. And then we get confused again
because a POSIX RE pattern like
(^|\n|\r)begin-base64[ \t]+[0-7]{3,4}[
\t]+...... is 2x faster as POSIX RE,
but the case-insensitive PCRE
/^Received[ \t]*:[ \t]by[ \t]+([^
\t]+)[ \t]/i is 30x faster than its
POSIX RE version! When it comes to
case sensitivity, PCRE has so far
seemed to be the best option. But I
found some really strange behaviour
from ereg/eregi. On a very simple
POSIX RE (^|\r|\n)mime-version[ \t]:
I found eregi() taking 3.60s (just a
number in a test benchmark), while the
corresponding PCRE took 0.16s! But if
I used ereg() (case-sensitive) the
POSIX RE time went down to 0.08s! So I
investigated further. I tried to make
the POSIX RE case-insensitive itself.
I got as far as this:
(^|\r|\n)[mM][iI][mM][eE]-vers[iI][oO][nN][
\t]*: This version also took 0.08s.
But if I try to apply the same rule to
any of the 'v', 'e', 'r' or 's'
letters that are not changed, the time
is back to the 3.60s mark, and not
gradually, but immediatelly so! The
test data didn't have any "vers" in
it, other "mime" words in it or any
"ion" that might be confusing the
POSIX parser, so I'm at a loss. Bottom
line: always benchmark your PCRE /
POSIX RE to find the fastest! Tests
were performed with PHP 5.1.2 under
Windows, from the command line. Pedro
Freire cynergi.com