我什么时候应该更喜欢 boost::regex （或 boost::xpressive）而不是 boost::algorithm

发布于 2024-11-09 12:05:08 字数 1462 浏览 2 评论 0原文

我认为 boost 正则表达式引擎会比 boost::algorithm 更快
这个简单的测试表明算法大大击败了正则表达式引擎
这是整个测试程序
我错过了什么吗？

#include "boost/algorithm/string.hpp"
#include "boost/regex.hpp"
#include "boost/xpressive/xpressive.hpp"
#include "boost/progress.hpp"
#include <iostream>

int main()
{
    boost::timer tm;
    const int ITERATIONS = 10000000;
    {
        std::string input("This is his face");
        tm.restart();
        for( int i = 0; i < ITERATIONS; ++i)
        {
            boost::algorithm::replace_all(input,"his","her");
        }
        std::cout << "boost::algorithm: " << tm.elapsed()/60 << std::endl;
    }

    {
        std::string input("This is his face");
        boost::regex expr("his");
        std::string format("her");
        tm.restart();
        for( int i = 0; i < ITERATIONS; ++i)
        {
            boost::regex_replace( input, expr, format );
        }
        std::cout << "boost::regex: " << tm.elapsed()/60 << std::endl;
    }

    {
        std::string input("This is his face");
        boost::xpressive::sregex expr = boost::xpressive::as_xpr("his");
        std::string format("her");
        tm.restart();
        for( int i = 0; i < ITERATIONS; ++i)
        {
            boost::xpressive::regex_replace(input, expr, format);
        }
        std::cout << "boost::xpressive: " << tm.elapsed()/60 << std::endl;
    }

    return 0;
}

原文

I thought the boost regex engines would be faster than boost::algorithm
This simple test shows algo beating the regex engines by a wide margin
This is the entire test program
Did I miss something?

#include "boost/algorithm/string.hpp"
#include "boost/regex.hpp"
#include "boost/xpressive/xpressive.hpp"
#include "boost/progress.hpp"
#include <iostream>

int main()
{
    boost::timer tm;
    const int ITERATIONS = 10000000;
    {
        std::string input("This is his face");
        tm.restart();
        for( int i = 0; i < ITERATIONS; ++i)
        {
            boost::algorithm::replace_all(input,"his","her");
        }
        std::cout << "boost::algorithm: " << tm.elapsed()/60 << std::endl;
    }

    {
        std::string input("This is his face");
        boost::regex expr("his");
        std::string format("her");
        tm.restart();
        for( int i = 0; i < ITERATIONS; ++i)
        {
            boost::regex_replace( input, expr, format );
        }
        std::cout << "boost::regex: " << tm.elapsed()/60 << std::endl;
    }

    {
        std::string input("This is his face");
        boost::xpressive::sregex expr = boost::xpressive::as_xpr("his");
        std::string format("her");
        tm.restart();
        for( int i = 0; i < ITERATIONS; ++i)
        {
            boost::xpressive::regex_replace(input, expr, format);
        }
        std::cout << "boost::xpressive: " << tm.elapsed()/60 << std::endl;
    }

    return 0;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小傻瓜 2024-11-16 12:05:08

regex 可以处理各种正则表达式（例如，“My.*Test”之类的内容可以在“我想知道已经编写了多少个名为 MySumTest 的类？”之类的文本中进行匹配）。它们比在文本中查找模式的算法更强大，但性能较差

回复收藏 0 原文

似最初 2024-11-16 12:05:08

我并不觉得这有什么令人惊讶的。简单的事情通常会更快。在高级语言（例如 JavaScript）中，将字符串处理委托给正则表达式通常是一种胜利，因为即使在解释性语言中执行简单的循环也会产生很大的开销，但同样的推理不适用于 C++ 等编译语言。

不管怎样，我想说你应该在合理的情况下使用 boost 字符串算法而不是正则表达式，因为 boost::regex 引入了运行时依赖项（它使用外部 .so 文件），而算法基本上是内联代码生成器，并且你应该只在需要的地方使用正则表达式...比如说寻找浮点数：

[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?

您想在没有正则表达式的情况下尝试吗？

I don't find this all that surprising; simple things usually are faster. In higher level languages, say JavaScript, it's usually a win to delegate string processing down to a regular expression because there's so much overhead even doing a simple loop in an interpreted language, but the same reasoning doesn't apply to compiled languages like C++.

Anyway, I would say you should use boost string algorithms over regex where it is reasonable to do so, because boost::regex introduces a runtime dependency (it uses an external .so file) while the algorithms are basically inline code generators, and you should use regexes only where you need them... say looking for an floating point number: