在 C 语言中,用另一个字符串替换字符串中所有出现的子字符串的最快方法是什么?

发布于 2024-12-08 23:17:09 字数 1279 浏览 0 评论 0原文

我正在寻找最有效(就“最快”而言)的方法来用另一个字符串替换字符串中所有出现的子字符串。到目前为止我所想到的是:

std::string StringReplaceAll(const std::string &cstSearch, const std::string &cstReplace, const std::string &cstSubject)
{
    if(cstSearch.length() > cstSubject.length() || cstSearch == cstReplace || cstSubject.empty() || cstSearch.empty() || cstSubject.find(cstSearch) == std::string::npos)
    {
        return cstSubject;
    }

    std::ostringstream                                  ossReturn;
    std::string::const_iterator                         ci(cstSubject.cbegin());
    const std::string::const_iterator::difference_type  ciFindSize(std::distance(cstSearch.cbegin(), cstSearch.cend()));

    for(std::string::const_iterator ciNow; (ciNow = std::search(ci, cstSubject.cend(), cstSearch.cbegin(), cstSearch.cend())) != cstSubject.cend(); ci = ciNow)
    {
        std::copy(ci, ciNow, std::ostreambuf_iterator<char> (ossReturn));
        std::copy(cstReplace.cbegin(), cstReplace.cend(), std::ostreambuf_iterator<char> (ossReturn));
        std::advance(ciNow, ciFindSize);
    }

    std::copy(ci, cstSubject.cend(), std::ostreambuf_iterator<char> (ossReturn));

    return ossReturn.str();
}

......而这个对于我的需求来说太慢了:-(

期待向你们学习!

I'm looking for the most efficient (in terms of "fastest") way to replace all occurrences of a substring within a string with another string. All I've came up with so far is:

std::string StringReplaceAll(const std::string &cstSearch, const std::string &cstReplace, const std::string &cstSubject)
{
    if(cstSearch.length() > cstSubject.length() || cstSearch == cstReplace || cstSubject.empty() || cstSearch.empty() || cstSubject.find(cstSearch) == std::string::npos)
    {
        return cstSubject;
    }

    std::ostringstream                                  ossReturn;
    std::string::const_iterator                         ci(cstSubject.cbegin());
    const std::string::const_iterator::difference_type  ciFindSize(std::distance(cstSearch.cbegin(), cstSearch.cend()));

    for(std::string::const_iterator ciNow; (ciNow = std::search(ci, cstSubject.cend(), cstSearch.cbegin(), cstSearch.cend())) != cstSubject.cend(); ci = ciNow)
    {
        std::copy(ci, ciNow, std::ostreambuf_iterator<char> (ossReturn));
        std::copy(cstReplace.cbegin(), cstReplace.cend(), std::ostreambuf_iterator<char> (ossReturn));
        std::advance(ciNow, ciFindSize);
    }

    std::copy(ci, cstSubject.cend(), std::ostreambuf_iterator<char> (ossReturn));

    return ossReturn.str();
}

... and this one is way(!!!) too slow for my needs :-(

Looking forward to learn from you guys!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

々眼睛长脚气 2024-12-15 23:17:09

首先,我使用 std::string 而不是 std::ostringstream 来构建
提高结果; std::ostringstream 用于格式化,没有
格式化要在这里完成。除此之外,你基本上已经有了
正确的算法;使用 std::search 查找下一个
应进行更换。我会使用 while 循环来让事情变得更糟
更具可读性,这给出了:(

std::string
replaceAll( std::string const& original,
            std::string const& before,
            std::string const& after )
{
    std::string retval;
    std::string::const_iterator end     = original.end();
    std::string::const_iterator current = original.begin();
    std::string::const_iterator next    =
            std::search( current, end, before.begin(), before.end() );
    while ( next != end ) {
        retval.append( current, next );
        retval.append( after );
        current = next + before.size();
        next = std::search( current, end, before.begin(), before.end() );
    }
    retval.append( current, next );
    return retval;
}

请注意,使用 std::string::append 会比使用更快
std::copy;字符串知道它必须添加多少,并且可以调整大小
相应的字符串。)

然后,捕获特殊情况就很简单了
没有可替换的内容,并立即返回初始字符串;那里
使用 std::string::reserve 可能会有一些改进
出色地。 (如果 beforeafter 的长度相同,
retval.reserve(original.size()) 是一个明显的胜利。即使他们不这样做,
这可能是一场胜利。至于先统计替换次数,然后
准确计算最终尺寸,我不知道。你必须
请结合您的实际用例进行测量以找出答案。)

First, I'd use std::string, rather than std::ostringstream, to build
up the results; std::ostringstream is for formatting, and there's no
formatting to be done here. Other than that, you've got basically the
correct algorithm; using std::search to find where the next
replacement should be done. I'd use a while loop to make things a bit
more readable, which gives:

std::string
replaceAll( std::string const& original,
            std::string const& before,
            std::string const& after )
{
    std::string retval;
    std::string::const_iterator end     = original.end();
    std::string::const_iterator current = original.begin();
    std::string::const_iterator next    =
            std::search( current, end, before.begin(), before.end() );
    while ( next != end ) {
        retval.append( current, next );
        retval.append( after );
        current = next + before.size();
        next = std::search( current, end, before.begin(), before.end() );
    }
    retval.append( current, next );
    return retval;
}

(Note that using std::string::append will be faster than using
std::copy; the string knows how many it must add, and can resize the
string accordingly.)

Afterwards, it would be trivial to catch the special case where there is
nothing to replace, and return the initial string immediately; there
might be some improvements to be had using std::string::reserve as
well. (If before and after have the same length,
retval.reserve( original.size() ) is a clear win. Even if they don't,
it could be a win. As for first counting the number of substitutions, then
exactly calculating the final size, I don't know. You'll have to
measure with your actual use cases to find out.)

牵强ㄟ 2024-12-15 23:17:09

我在 http://hardforum.com/showthread.php?t=979477< 询问了同样的事情< /a> 几年前。

我不太记得了,但是以下代码位于注释 #31 中 我认为它比我的其他尝试更快(但不比 MikeBlas 的 meted_string 示例快):

#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <sstream>

using namespace std;

inline string replaceAll(const string& s, const string& f, const string& r) {
    if (s.empty() || f.empty() || f == r || f.size() > s.size() || s.find(f) == string::npos) {
        return s;
    }
    ostringstream build_it;
    typedef string::const_iterator iter;
    iter i(s.begin());
    const iter::difference_type f_size(distance(f.begin(), f.end()));
    for (iter pos; (pos = search(i , s.end(), f.begin(), f.end())) != s.end(); ) {
        copy(i, pos,  ostreambuf_iterator<char>(build_it));
        copy(r.begin(), r.end(), ostreambuf_iterator<char>(build_it));
        advance(pos, f_size);
        i = pos;
    }
    copy(i, s.end(), ostreambuf_iterator<char>(build_it));
    return build_it.str();
}

int main() {
    const string source(20971520, 'a');
    const string test(replaceAll(source, "a", "4"));
}

请参阅该线程以获取更多示例和大量讨论。

如果我没记错的话,让事情变得比boost的replace_all更快真的很容易。

这是一个更清晰的 c++0x 版本:

#include <string>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <ostream>
using namespace std;

string replace_all_copy(const string& s, const string& f, const string& r) {
    if (s.empty() || f.empty() || f == r || f.size() > s.size()) {
        return s;
    }
    ostringstream buffer;
    auto start = s.cbegin();
    while (true) {
        const auto end = search(start , s.cend(), f.cbegin(), f.cend());
        copy(start, end,  ostreambuf_iterator<char>(buffer));
        if (end == s.cend()) {
            break;
        }
        copy(r.cbegin(), r.cend(), ostreambuf_iterator<char>(buffer));
        start = end + f.size();
    }
    return buffer.str();
}

int main() {
    const string s(20971520, 'a');
    const string result = replace_all_copy(s, "a", "4");
}

// g++ -Wall -Wextra replace_all_copy.cc -o replace_all_copy -O3 -s -std=c++0x

I asked about this same thing at http://hardforum.com/showthread.php?t=979477 years ago.

I don't remember it all that well, but the following code was in comment #31 and I think it was faster than my other attempts (but not faster than MikeBlas' metered_string example):

#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include <sstream>

using namespace std;

inline string replaceAll(const string& s, const string& f, const string& r) {
    if (s.empty() || f.empty() || f == r || f.size() > s.size() || s.find(f) == string::npos) {
        return s;
    }
    ostringstream build_it;
    typedef string::const_iterator iter;
    iter i(s.begin());
    const iter::difference_type f_size(distance(f.begin(), f.end()));
    for (iter pos; (pos = search(i , s.end(), f.begin(), f.end())) != s.end(); ) {
        copy(i, pos,  ostreambuf_iterator<char>(build_it));
        copy(r.begin(), r.end(), ostreambuf_iterator<char>(build_it));
        advance(pos, f_size);
        i = pos;
    }
    copy(i, s.end(), ostreambuf_iterator<char>(build_it));
    return build_it.str();
}

int main() {
    const string source(20971520, 'a');
    const string test(replaceAll(source, "a", "4"));
}

See the thread for more examples and lots of discussion.

If I remember correctly, it was really easy to make things faster than boost's replace_all.

Here's a clearer c++0x version:

#include <string>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <ostream>
using namespace std;

string replace_all_copy(const string& s, const string& f, const string& r) {
    if (s.empty() || f.empty() || f == r || f.size() > s.size()) {
        return s;
    }
    ostringstream buffer;
    auto start = s.cbegin();
    while (true) {
        const auto end = search(start , s.cend(), f.cbegin(), f.cend());
        copy(start, end,  ostreambuf_iterator<char>(buffer));
        if (end == s.cend()) {
            break;
        }
        copy(r.cbegin(), r.cend(), ostreambuf_iterator<char>(buffer));
        start = end + f.size();
    }
    return buffer.str();
}

int main() {
    const string s(20971520, 'a');
    const string result = replace_all_copy(s, "a", "4");
}

// g++ -Wall -Wextra replace_all_copy.cc -o replace_all_copy -O3 -s -std=c++0x
栖竹 2024-12-15 23:17:09

我认为 std::search 使用了一个简单的算法,至少 参考 说明了复杂性一个朴素的字符串匹配算法。如果您将其替换为 Boyer-Moore 的实现,您应该能够看到性能的显着提高。

除此之外,您严重依赖于良好的编译器优化。通过返回字符串而不是传递 string* 结果,会导致不必要的结果复制。然而,编译器可能会帮助你。但为了确保您可以尝试将指向结果的指针作为参数传递并附加到该字符串。然而,将 std::search 替换为非平凡算法(如上所述的 boyer-moore 或 knuth-morris-pratt)的实现应该比微调返回机制产生的影响大几个数量级。

I think std::search uses a trivial algorithm, at least the reference states the complexity of a naiive string matching algorithm. If you replace it by an implementation of Boyer-Moore, you should be able to see significant performances increases.

Apart from that you are heavily dependent on good compiler optimizations. By returning the string instead of passing a string* result, you cause unnecessary copying of the result. However, compilers may help you there. But just to be sure you can alos try passing a pointer to the result as argument and appending to that string. However, swapping std::search for an implementation of a non trivial algorithm (boyer-moore as mentioned above or knuth-morris-pratt) should have an impact that is larger by several orders of magnitude compared to fine-tuning the return mechanism.

时光清浅 2024-12-15 23:17:09

试试这个。

template<class T> inline void Replace(T& str, const T& str1, const T& str2)
{
    const T::size_type str2Size(str2.size());
    const T::size_type str1Size(str1.size());

    T::size_type n = 0;
    while (T::npos != (n = str.find(str1, n))) {
        str.replace(n, str1Size, str2);
        n += str2Size;
    }
}

std::string val(L"abcabcabc");
Replace(val, L"abc", L"d");

Try this one.

template<class T> inline void Replace(T& str, const T& str1, const T& str2)
{
    const T::size_type str2Size(str2.size());
    const T::size_type str1Size(str1.size());

    T::size_type n = 0;
    while (T::npos != (n = str.find(str1, n))) {
        str.replace(n, str1Size, str2);
        n += str2Size;
    }
}

std::string val(L"abcabcabc");
Replace(val, L"abc", L"d");
我ぃ本無心為│何有愛 2024-12-15 23:17:09

我发现这个更快:

typedef std::string String;

String replaceStringAll(String str, const String& old, const String& new_s) {
    if(!old.empty()){
        size_t pos = str.find(old);
        while ((pos = str.find(old, pos)) != String::npos) {
             str=str.replace(pos, old.length(), new_s);
             pos += new_s.length();
        }
    }
    return str;
}

James kanzes ReplaceAll 函数的比较:

replaceAll      : 28552
replaceStringAll: 33518
replaceAll      : 64541
replaceStringAll: 14158
replaceAll      : 20164
replaceStringAll: 13651
replaceAll      : 11099
replaceStringAll: 5650
replaceAll      : 23775
replaceStringAll: 10821
replaceAll      : 10261
replaceStringAll: 5125
replaceAll      : 10283
replaceStringAll: 5374
replaceAll      : 9993
replaceStringAll: 5664
replaceAll      : 10035
replaceStringAll: 5246
replaceAll      : 8570
replaceStringAll: 4381

使用 std 以纳秒为单位计算时间::chrono::high_resolution_clock

I found this one to be faster:

typedef std::string String;

String replaceStringAll(String str, const String& old, const String& new_s) {
    if(!old.empty()){
        size_t pos = str.find(old);
        while ((pos = str.find(old, pos)) != String::npos) {
             str=str.replace(pos, old.length(), new_s);
             pos += new_s.length();
        }
    }
    return str;
}

A comparison with James kanzes replaceAll function:

replaceAll      : 28552
replaceStringAll: 33518
replaceAll      : 64541
replaceStringAll: 14158
replaceAll      : 20164
replaceStringAll: 13651
replaceAll      : 11099
replaceStringAll: 5650
replaceAll      : 23775
replaceStringAll: 10821
replaceAll      : 10261
replaceStringAll: 5125
replaceAll      : 10283
replaceStringAll: 5374
replaceAll      : 9993
replaceStringAll: 5664
replaceAll      : 10035
replaceStringAll: 5246
replaceAll      : 8570
replaceStringAll: 4381

Times are calculated in nanoseconds with std::chrono::high_resolution_clock

百变从容 2024-12-15 23:17:09

我会提出一个优化步骤:进行初步检查以检查是否有任何东西需要更换。如果没有什么可替换的,就返回,避免分配内存和复制。

I would propose an optimization step: make a preliminary pass to check if there is anything to replace at all. If there is nothing to replace, just return and avoid allocating memory and copying.

傻比既视感 2024-12-15 23:17:09

我测试了 3 种替换子字符串的方法

  1. boost_replace_all
  2. regex_replace_all
  3. 基本手工字符串替换算法
void regex_replace_test(std::string &str) {

    std::string res = std::regex_replace(str, std::regex(SUBSTRING), REPLACEABLE);

}

void boost_replace_all_test(std::string &str) {
    boost::replace_all(str, SUBSTRING, REPLACEABLE);
}

void handmade_replace_test(std::string &str) {
    size_t pos = 0;
    while ((pos = str.find(SUBSTRING, pos)) != std::string::npos) {
        str.replace(pos, SUBSTRING.length(), REPLACEABLE);
        pos += REPLACEABLE.length();
    }
}

输入数据信息

test to replace SUBSTRING  in BASE string.

BASE string length = 48004

SUBSTRING = fdkfgkd

SUBSTRING length = 7

SUBSTRING's quantity in BASE string= 4364

REPLACEABLE = ddddddd

REPLACEABLE length = 7

我得到以下结果:

regex_replace_test = 14 ms , 14921715 ns

boost_replace_all_test = 4 ms , 4167104 ns

handmade_replace_test = 0 ms , 372201 ns

最快的方法是基本手工子字符串替换方法。

i tested 3 ways to replace substring

  1. boost_replace_all
  2. regex_replace_all
  3. basic handmade string replace algorithm
void regex_replace_test(std::string &str) {

    std::string res = std::regex_replace(str, std::regex(SUBSTRING), REPLACEABLE);

}

void boost_replace_all_test(std::string &str) {
    boost::replace_all(str, SUBSTRING, REPLACEABLE);
}

void handmade_replace_test(std::string &str) {
    size_t pos = 0;
    while ((pos = str.find(SUBSTRING, pos)) != std::string::npos) {
        str.replace(pos, SUBSTRING.length(), REPLACEABLE);
        pos += REPLACEABLE.length();
    }
}

input data info

test to replace SUBSTRING  in BASE string.

BASE string length = 48004

SUBSTRING = fdkfgkd

SUBSTRING length = 7

SUBSTRING's quantity in BASE string= 4364

REPLACEABLE = ddddddd

REPLACEABLE length = 7

i got the following results:

regex_replace_test = 14 ms , 14921715 ns

boost_replace_all_test = 4 ms , 4167104 ns

handmade_replace_test = 0 ms , 372201 ns

fastest way is basic handmade substring replace method.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文