在 C 语言中,用另一个字符串替换字符串中所有出现的子字符串的最快方法是什么?
我正在寻找最有效(就“最快”而言)的方法来用另一个字符串替换字符串中所有出现的子字符串。到目前为止我所想到的是:
std::string StringReplaceAll(const std::string &cstSearch, const std::string &cstReplace, const std::string &cstSubject)
{
if(cstSearch.length() > cstSubject.length() || cstSearch == cstReplace || cstSubject.empty() || cstSearch.empty() || cstSubject.find(cstSearch) == std::string::npos)
{
return cstSubject;
}
std::ostringstream ossReturn;
std::string::const_iterator ci(cstSubject.cbegin());
const std::string::const_iterator::difference_type ciFindSize(std::distance(cstSearch.cbegin(), cstSearch.cend()));
for(std::string::const_iterator ciNow; (ciNow = std::search(ci, cstSubject.cend(), cstSearch.cbegin(), cstSearch.cend())) != cstSubject.cend(); ci = ciNow)
{
std::copy(ci, ciNow, std::ostreambuf_iterator<char> (ossReturn));
std::copy(cstReplace.cbegin(), cstReplace.cend(), std::ostreambuf_iterator<char> (ossReturn));
std::advance(ciNow, ciFindSize);
}
std::copy(ci, cstSubject.cend(), std::ostreambuf_iterator<char> (ossReturn));
return ossReturn.str();
}
......而这个对于我的需求来说太慢了:-(
期待向你们学习!
I'm looking for the most efficient (in terms of "fastest") way to replace all occurrences of a substring within a string with another string. All I've came up with so far is:
std::string StringReplaceAll(const std::string &cstSearch, const std::string &cstReplace, const std::string &cstSubject)
{
if(cstSearch.length() > cstSubject.length() || cstSearch == cstReplace || cstSubject.empty() || cstSearch.empty() || cstSubject.find(cstSearch) == std::string::npos)
{
return cstSubject;
}
std::ostringstream ossReturn;
std::string::const_iterator ci(cstSubject.cbegin());
const std::string::const_iterator::difference_type ciFindSize(std::distance(cstSearch.cbegin(), cstSearch.cend()));
for(std::string::const_iterator ciNow; (ciNow = std::search(ci, cstSubject.cend(), cstSearch.cbegin(), cstSearch.cend())) != cstSubject.cend(); ci = ciNow)
{
std::copy(ci, ciNow, std::ostreambuf_iterator<char> (ossReturn));
std::copy(cstReplace.cbegin(), cstReplace.cend(), std::ostreambuf_iterator<char> (ossReturn));
std::advance(ciNow, ciFindSize);
}
std::copy(ci, cstSubject.cend(), std::ostreambuf_iterator<char> (ossReturn));
return ossReturn.str();
}
... and this one is way(!!!) too slow for my needs :-(
Looking forward to learn from you guys!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
首先,我使用
std::string
而不是std::ostringstream
来构建提高结果;
std::ostringstream
用于格式化,没有格式化要在这里完成。除此之外,你基本上已经有了
正确的算法;使用
std::search
查找下一个应进行更换。我会使用
while
循环来让事情变得更糟更具可读性,这给出了:(
请注意,使用
std::string::append
会比使用更快std::copy
;字符串知道它必须添加多少,并且可以调整大小相应的字符串。)
然后,捕获特殊情况就很简单了
没有可替换的内容,并立即返回初始字符串;那里
使用
std::string::reserve
可能会有一些改进出色地。 (如果
before
和after
的长度相同,retval.reserve(original.size()) 是一个明显的胜利。即使他们不这样做,
这可能是一场胜利。至于先统计替换次数,然后
准确计算最终尺寸,我不知道。你必须
请结合您的实际用例进行测量以找出答案。)
First, I'd use
std::string
, rather thanstd::ostringstream
, to buildup the results;
std::ostringstream
is for formatting, and there's noformatting to be done here. Other than that, you've got basically the
correct algorithm; using
std::search
to find where the nextreplacement should be done. I'd use a
while
loop to make things a bitmore readable, which gives:
(Note that using
std::string::append
will be faster than usingstd::copy
; the string knows how many it must add, and can resize thestring accordingly.)
Afterwards, it would be trivial to catch the special case where there is
nothing to replace, and return the initial string immediately; there
might be some improvements to be had using
std::string::reserve
aswell. (If
before
andafter
have the same length,retval.reserve( original.size() )
is a clear win. Even if they don't,it could be a win. As for first counting the number of substitutions, then
exactly calculating the final size, I don't know. You'll have to
measure with your actual use cases to find out.)
我在 http://hardforum.com/showthread.php?t=979477< 询问了同样的事情< /a> 几年前。
我不太记得了,但是以下代码位于注释 #31 中 我认为它比我的其他尝试更快(但不比 MikeBlas 的 meted_string 示例快):
请参阅该线程以获取更多示例和大量讨论。
如果我没记错的话,让事情变得比boost的replace_all更快真的很容易。
这是一个更清晰的 c++0x 版本:
I asked about this same thing at http://hardforum.com/showthread.php?t=979477 years ago.
I don't remember it all that well, but the following code was in comment #31 and I think it was faster than my other attempts (but not faster than MikeBlas' metered_string example):
See the thread for more examples and lots of discussion.
If I remember correctly, it was really easy to make things faster than boost's replace_all.
Here's a clearer c++0x version:
我认为 std::search 使用了一个简单的算法,至少 参考 说明了复杂性一个朴素的字符串匹配算法。如果您将其替换为 Boyer-Moore 的实现,您应该能够看到性能的显着提高。
除此之外,您严重依赖于良好的编译器优化。通过返回字符串而不是传递 string* 结果,会导致不必要的结果复制。然而,编译器可能会帮助你。但为了确保您可以尝试将指向结果的指针作为参数传递并附加到该字符串。然而,将 std::search 替换为非平凡算法(如上所述的 boyer-moore 或 knuth-morris-pratt)的实现应该比微调返回机制产生的影响大几个数量级。
I think std::search uses a trivial algorithm, at least the reference states the complexity of a naiive string matching algorithm. If you replace it by an implementation of Boyer-Moore, you should be able to see significant performances increases.
Apart from that you are heavily dependent on good compiler optimizations. By returning the string instead of passing a string* result, you cause unnecessary copying of the result. However, compilers may help you there. But just to be sure you can alos try passing a pointer to the result as argument and appending to that string. However, swapping std::search for an implementation of a non trivial algorithm (boyer-moore as mentioned above or knuth-morris-pratt) should have an impact that is larger by several orders of magnitude compared to fine-tuning the return mechanism.
试试这个。
Try this one.
我发现这个更快:
与 James kanzes ReplaceAll 函数的比较:
使用
std 以纳秒为单位计算时间::chrono::high_resolution_clock
I found this one to be faster:
A comparison with James kanzes replaceAll function:
Times are calculated in nanoseconds with
std::chrono::high_resolution_clock
我会提出一个优化步骤:进行初步检查以检查是否有任何东西需要更换。如果没有什么可替换的,就返回,避免分配内存和复制。
I would propose an optimization step: make a preliminary pass to check if there is anything to replace at all. If there is nothing to replace, just return and avoid allocating memory and copying.
我测试了 3 种替换子字符串的方法
输入数据信息
我得到以下结果:
最快的方法是基本手工子字符串替换方法。
i tested 3 ways to replace substring
input data info
i got the following results:
fastest way is basic handmade substring replace method.