glibc正则表达式性能
有人有测量 glibc 正则表达式函数的经验吗? 我是否需要运行任何通用测试来进行此类测量(除了测试我打算搜索的确切模式之外)?
谢谢。
Anyone has experience measuring glibc regexp functions?
Are there any generic tests I need to run to make such a measurements (in addition to testing the exact patterns I intend to search)?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看一看
http://www.boost.org/ doc/libs/1_41_0/libs/regex/doc/gcc-performance.html
take a look
http://www.boost.org/doc/libs/1_41_0/libs/regex/doc/gcc-performance.html
您是否使用手写的逐字符比较、标准字符串匹配函数或智能文本匹配算法?
特别是在前一种情况下,切换到正则表达式可能会更快,具体取决于正则表达式的类型和您使用的库(不仅有 glibc,还有很多库:PCRE,列出的 此处 以及更多)。
Are you using hand-written char-by-char comparison, standard string matching functions, or smart text-matching algorithms?
In the former case especially, switching to regexp may even be faster, depending on the kind of regexp and the library you use (there isn't only glibc, there's plenty of libraries around: PCRE, the ones listed here and much more).
正则表达式的性能很大程度上取决于您使用的正则表达式以及将其应用于哪些数据。 仅仅对一堆正则表达式进行基准测试没有什么意义。 您必须将使用正则表达式的实际代码与实际数据上的实际纯 C 替代方案进行比较。
根据经验,我想说,如果您已经拥有功能正常的程序代码来进行所需的文本匹配,则只需将其保留在适当的位置即可。 如果您还没有该代码,我建议从正则表达式开始,因为您将为自己节省大量开发时间(假设您熟悉正则表达式)。 您可能可以编写比等效正则表达式更快的过程代码,但差异不会很大。 编写和维护过程代码的工作量将明显高于使用正则表达式。
Regular expression performance depends much on which regular expression you're using and which data you're applying it to. There's little point in just benchmarking a bunch of regular expressions. You have to compare actual code using a regex and your actual plain C alternative on your actual data.
As a rule of thumb, I'd say that if you already have properly functioning procedural code to do the text matching you need, just leave that in place. If you don't have that code yet, I recommend to start with regexes as you'll save yourself much development time (assuming you're familiar with regexes). You can probably write procedural code that is faster than the equivalent regex, but the difference isn't going to be dramatic. The effort of writing and maintaining the procedural code will be significantly higher than using a regex.