正则表达式: boost::xpressive 与 boost::regex
我想用 C++ 做一些正则表达式,所以我查看了 interwebz(是的,我是 C++ 的初学者/中级)并发现 这个答案。
我真的不知道在 boost::regex 和 boost::xpressive 之间如何选择。有什么优点/缺点?
我还读到,与 boost::regex 相反的 boost::xpressive 是一个仅包含头文件的库。在 Linux 和 Windows 上静态编译 boost::regex 很难吗(我几乎总是编写跨平台应用程序)?
我也对编译时间的比较感兴趣。我当前有一个使用 boost::xpressive 的实现,并且我对编译时间不太满意(但我没有与 boost::regex 进行比较)。
当然,我也愿意接受有关正则表达式实现的其他建议。这些要求是免费的(就像啤酒一样)并且与 http://nclabs.org/license.php。
I wanted to do some regular expressions in C++ so I looked on the interwebz (yes, I am an beginner/intermediate with C++) and found this SO answer.
I really don't know what to choose between boost::regex and boost::xpressive. What are the pros/cons?
I also read that boost::xpressive opposed to boost::regex is a header-only library. Is it hard to statically compile boost::regex on Linux and Windows (I almost always write cross-platform applications)?
I'm also interested in comparisons of compile time. I have a current implementation using boost::xpressive and I'm not too content with the compile times (but I have no comparisons to boost::regex).
Of course I'm open for other suggestions for regex implementations too. The requirements are free (as in beer) and compatible with http://nclabs.org/license.php.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
一个相当重要的区别是 Boost Regex 可以支持链接到 ICU 以支持 Unicode(字符类等)增强正则表达式 ICU 支持。
据我所知,Boost Xpressive 没有内置这种支持。
One fairly important difference is that Boost Regex can support linking to ICU for Unicode support (character classes, etc) Boost Regex ICU Support.
As far as I can tell, Boost Xpressive doesn't have this kind of support built-in.
如果您需要在运行时创建正则表达式(即让用户输入正则表达式进行搜索),则不能使用 xpressive ,因为它只是编译时。
另一方面,由于它是一个编译时构造,因此它应该比 regex 更能从优化器中受益。
我用 Boost.MPL、StateChart 和 Spirit 做了足够多的事情,220KB 的编译器警告和错误并没有真正让我烦恼。如果这对您来说听起来很糟糕,请坚持使用 Boost.Regex。
如果您确实使用 xpressive,我强烈建议打开
-Wfatal-errors
因为这将在第一个 'error:' 行之后停止编译(以及更多错误)。对于编译时间来说,这不是竞争。 Boost.Regex 会更快*。 xpressive 使用 MPL 的事实将导致编译时间显着增加。
*这假设您只构建 dll/so 一次
Well if you need to create a regular expression at runtime (i.e. Letting the user type in a regular expression to search for) you can't use
xpressive
as it is compile time only.On the other hand, since it is a compile-time construct, it should benefit more from your optimizer than
regex
does.I do enough stuff with Boost.MPL, StateChart, and Spirit that 220KB of compiler warning and errors don't really bother me much. If that sounds like hell to you, stick with Boost.Regex.
If you do use xpressive, I highly recommend turning on
-Wfatal-errors
as this will stop compilation (and further errors) after the first 'error:' line.For compilation time, it's no contest. Boost.Regex will be faster*. The fact that xpressive uses MPL will cause compile times to be dramatically increased.
*This assumes you only build the dll/so once
当使用 Boost 库时,由于跨平台兼容性问题,我倾向于使用仅头文件库。这样做的缺点是,当您的编译器报告与您使用库相关的错误时,仅标头输出往往会显得晦涩难懂。
When using the Boost libraries I tend to lean toward the use of header only libraries, due to cross platform compatability issues. The down side of that is that when your compiler reports an error related to your use of the the library, the header only output tends toward the arcane.
假设您使用的是相当新的编译器,那么它很有可能已经包含正则表达式包。尝试执行
#include
并查看编译器是否找到它。唯一的技巧是它可以位于两个不同的命名空间中的一个(或两个)中。正则表达式包含在 C++ 标准的 TR1 中,也包含在 C++11(最终草案)中。 TR1 版本位于名为
tr1
的命名空间中,其中标准版本位于std
中,就像库的其余部分一样。FWIW,这本质上与 Boost regex 相同,而不是 Boost Xpressive。
Assuming you're using a reasonably recent compiler, there's a pretty decent chance that it includes a regex package already. Try just doing
#include <regex>
and see if the compiler finds it.The only trick to things is that it could be in either (or both) of two different namespaces. Regexes were included in TR1 of the C++ standard, and are also in (the final drafts of) C++11. The TR1 version is in a namespace named
tr1
, where the standard version is instd
, just like the rest of the library.FWIW, this is essentially the same as Boost regex, not Boost Xpressive.
我会尝试以更理论的方式更深入地研究编译时正则表达式(CTR)与运行时(动态)正则表达式(RTR)的主题来补充其他人的答案(这个主题是由OP问题间接暗示的) 。我想由于历史原因,运行时正则表达式更广为人知和流行(大多数语言核心库实现)。与 CTR 不同,当正则表达式在运行时确定时,它们就可以了。两者都在有限状态机的基础上工作。
RTR 由某种通用有限状态机“编译”和解释(通用意味着它的一种解释器,该方案在运行时给出,在某些内部数据结构中“编译” - 当您传递正则表达式字符串时,然后在运行时解释-时间)。
但是 CTR 是在编译时“编译”的,并且对于特定的正则表达式是特定的,因此当在运行时给出正则表达式时(文本编辑器、文件/互联网搜索等应用程序),您无法使用它们发动机)。
但它们先验地更有效(但是理论上),因为在编译时有限状态机中定制的效率比具有该机器的表预设方案的解释器更有效(一些类似的情况是反射字段访问与编译时访问,或专门的针对某些固定参数进行优化的函数那里 )。另一个优点是编译时语法检查。 CTR 可以通过元编程和/或代码生成来实现。
至于具体实现——RTR很多,但CTR却不多。对于 C++,它们是上面提到的 Boost 和 STL C++0x11 实现。您可能需要它们来优化正则表达式性能/生成代码的大小/内存使用情况,主要与嵌入式系统或高性能特定应用程序相关。
关于点击率的问题
寻找 CTR 实现比较困难,如果找到的话,一个例子是 Re2C 代码生成器项目,Java CTR 实现 和 C# 实现具有正则表达式的运行时编译(到 IL 代码,而不是内部数据结构)[有这样的问题]
PS 抱歉,无法发布一些相关链接,因为 名声
I would try to supplement other people answers by get deeper into topic of compile-time regular expressions(CTR) vs run-time(dynamic) regular expressions(RTR) in a more theoretical way(this topic is implied by OP question indirectly IMHO). Run-time regex are more known and popular(most language core-libraries implementations), i suppose due to historical reasons. They are OK when regular expression is determined at run-time, unlike CTR. Both work on finite state machine basis.
RTR are "compiled" and interpreted by some kind of universal finite state machine(universal means its kind of interpreter which scheme is given at run-time, "compiled" in some internal data structure - when you pass regex string, then interpreted at run-time).
But CTR is "compiled" at compile-time and are specific for particular regex, so you can't use them, when regex is given at run-time(applications like text editors, file/internet search engines).
But they are a priori more efficient(theoretically however) as customized in compile-time finite state machine will be efficient, than interpreter with table-preset scheme of this machine(some similar cases are reflection field access vs compile-time access, or specialized function optimized for some fixed parameter as pointed out there). Another advantage is compile-time syntax checking. CTR can be implemented through meta-programming and/or code generation.
As for specific implementations - there are many RTR, but not so numerous CTR. For C++ they are above mentioned Boost and STL C++0x11 implementations. You may need them for optimizing regex perfomance/size of generated code/memory usage, mostly relevant for embedded systems or high perfomance specific applications.
SO question about CTR
Finding CTR-implementations is harder, one example if found is Re2C Code generator project, Java CTR implementation and C# implementation featuring run-time compilation(into IL code, not internal data structure) of Regex [there is SO question about it]
P.S. Sorry, couldn't post some relevant links due to reputation