C++字符串面试问题
我最近参加了一次 C++ 技术面试,面试中给了我一些简单的字符串操作代码,该代码的目的是获取一个字符串并返回一个由第一个和最后一个 n 个字符组成的字符串,然后继续纠正任何bug 并让函数尽可能高效,我想出了下面的解决方案,但是面试官声称有一种更快更优化的方法:
原始代码:
std::string first_last_n(int n, std::string s)
{
std::string first_n = s.substr(0,n);
std::string last_n = s.substr(s.size()-n-1,n);
return first_n + last_n;
}
我的代码:
bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
if (s.size() < n)
return false;
r.reserve(2 * n);
r.resize(0);
r.append(s.data(),s.data() + n);
r.append(s.data() + (s.size() - n), s.data() + s.size());
return true;
}
我的更改摘要:
更改了接口以返回字符串作为参考(假设 RVO 和右值尚不可用)
删除了通过构造的临时字符串substr
将输入字符串作为常量引用传递,以便绕过输入的临时实例化
- < p>修复了last_n字符串中的off-by-1错误
将每个字符被触碰的次数减少到一次或两次(在发生重叠场景)
在字符串 s 的大小小于 n 的情况下进行检查,返回false 表示失败。
假设只允许使用本机 C++,是否有其他方法可以更有效或更优化地执行上述操作?
注1:原始输入字符串实例不可修改。
注2:所有解决方案必须通过以下测试用例,否则无效。
void test()
{
{
std::string s = "0123456789";
std::string r = first_last_n(10,s);
assert(r == "01234567890123456789");
}
{
std::string s = "0123456789ABC0123456789";
std::string r = first_last_n(10,s);
assert(r == "01234567890123456789");
}
{
std::string s = "1234321";
std::string r = first_last_n(5,s);
assert(r == "1234334321");
}
}
I was recently in a C++ technical interview, where I was given a bit of simple string manipulation code, which is intended to take a string and return a string that is comprised of the first and last n-characters, and then proceed to correct any bugs and to also make the function as efficient as possible, I came up with the solution below, however the interviewer claimed there was an even faster more optimal way:
Original code:
std::string first_last_n(int n, std::string s)
{
std::string first_n = s.substr(0,n);
std::string last_n = s.substr(s.size()-n-1,n);
return first_n + last_n;
}
My code:
bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
if (s.size() < n)
return false;
r.reserve(2 * n);
r.resize(0);
r.append(s.data(),s.data() + n);
r.append(s.data() + (s.size() - n), s.data() + s.size());
return true;
}
Summary of my changes:
Changed the interface to take a return string as reference (assuming RVO and rvalues are not available yet)
Removed temporary strings being constructed via substr
Passed input string as a const reference inorder to get around temporary instantiation of input
Fixed off-by-1 error in last_n string
Reduced the number of times each character is touched down to once, or twice (in the event of an overlapping scenario)
Placed a check in the event the string s's size is less than n, returning false for failure.
Assuming only native C++ is allowed, is there some other way of doing the above more efficiently or optimally?
Note 1: The original input string instance is not to be modified.
Note 2: All solutions must pass the following test case, otherwise they are not valid.
void test()
{
{
std::string s = "0123456789";
std::string r = first_last_n(10,s);
assert(r == "01234567890123456789");
}
{
std::string s = "0123456789ABC0123456789";
std::string r = first_last_n(10,s);
assert(r == "01234567890123456789");
}
{
std::string s = "1234321";
std::string r = first_last_n(5,s);
assert(r == "1234334321");
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
此实现应该很快:
它通过了所有三个单元测试。
当使用 GNU libstdc++ 时,声明 & 的行初始化 ret 的速度非常快,因为 libstdc++ 使用全局“空字符串”变量。因此,它只是一个指针副本。对
s
的begin
和end
调用也很快,因为它们将解析为begin
和 < 的 const 版本code>end、begin() const
和end() const
,因此s
的内部表示不会“泄露” 。对于 libstdc++,std::string::const_iterator
是const char*
,它是一个指针类型和随机访问迭代器。因此,当std::string::append(const char*, const char*)
调用std::distance
获取输入范围的长度,是一个指针差分操作。此外,std::string::append(const char*, const char*)
会产生类似于memmove
的结果。最后,reserve 操作确保有足够的内存可用于返回值。编辑:
出于好奇,这里是 MinGW g++ 4.5.0 的汇编输出中
ret
的初始化:它只是将指针复制到全局“空表示”。
编辑2:
好的。我现在已经使用 g++ 4.5.0 和 Visual C++ 16.00.30319.01 测试了四种变体:
变体 1(“c_str 变体”):
变体 2(“数据字符串”变体):
变体 3:
变体 4(我的原始代码):
g++ 4.5.0 的结果是:
VC++ 16.00.30319.01 的结果是:
毫不奇怪,最快的变体取决于编译器。然而,不知道将使用哪个编译器,我认为我的变体是最好的,因为它是熟悉的 C++ 风格,使用 g++ 时它是最快的,并且使用 VC++ 时它并不比变体 1 或 2 慢很多。
VC++ 结果中有趣的一件事是使用
c_str
而不是data
更快。也许这就是为什么你的面试官说有比你的实施更快的方法。EDIT3:
实际上,我只是想到了另一种变体:
变体 5:
它就像变体 4 一样,只是保存了
s
的开始和结束迭代器。当测试变体 5 时,在使用 VC++ 时,它实际上击败了变体 2(数据字符串变体):
This implementation should be fast:
It passes all three unit tests.
When using GNU libstdc++, the line that declares & initializes
ret
is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls tobegin
andend
ons
are also fast because they will resolve to the const versions ofbegin
andend
,begin() const
andend() const
, so the internal representation ofs
is not "leaked". With libstdc++,std::string::const_iterator
isconst char*
, which is a pointer type and random access iterator. Thus, whenstd::string::append<const char*>(const char*, const char*)
callsstd::distance
to obtain the length of the input range, it is a pointer difference operation. Also,std::string::append<const char*>(const char*, const char*)
results in something like amemmove
. Finally, thereserve
operation ensures that enough memory is available for the return value.EDIT:
For the curious, here is the initialization of
ret
in the assembly output of MinGW g++ 4.5.0:It's simply copying the pointer to the global "empty representation".
EDIT2:
Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:
Variant 1 (the "c_str variant"):
Variant 2 (the "data string" variant):
Variant 3:
Variant 4 (my original code):
The results for g++ 4.5.0 are:
The results for VC++ 16.00.30319.01 are:
Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.
One thing interesting from the VC++ results is that using
c_str
rather thandata
is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.EDIT3:
Actually, I just thought about another variant:
Variant 5:
It's just like variant 4 except that the begin and end iterators for
s
are saved.When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++:
如果不需要保留原字符串的内容,则可以将最后n个字符复制到原字符串的
[n+1, 2n]
位置,并在处截断2n
。您必须首先小心地扩展字符串,并且如果字符串短于2n
,则在写入之前要注意不要覆盖任何字符。这将使构造字符串的操作数量减半,并且无需创建新字符串。所以理论上它的速度要快 2 到 4 倍。当然,你刚刚破坏了原始字符串,你必须询问面试官是否可以接受。
If you don't need to maintain the contents of the original string, then you can copy the last n characters into positions
[n+1, 2n]
of the original string and truncate it at2n
. You will have to be careful to first expand the string and also be careful not to overwrite any characters before writing to them if the string is shorter than2n
.This will halve the number of operations to construct the string, as well as remove the need to create a new string. So its theoretically between 2 and 4 times faster. But of course you have just destroyed the original string, which you'd have to ask the interviewer if it is acceptable.
删除中间的 N-2n 个字符怎么样,其中 N 是源字符串的长度?
How about removing the middle N-2n characters, where N is the length of the source string?
Times:
注意:Timing 专门测试“长”字符串。即那些不使用短字符串优化的情况。 (我的琴弦长度为 100)。
Times:
Note: Timing specifically tests "long" strings. I.e. those where short string optimization is not used. (My strings were 100 length).
我唯一的想法是,如果仅使用 C 空终止字符串调用此函数,则可能需要为参数 's' 额外构造 std::string 。
可能“更”有效的方法是允许传入 std::string 或 const char *s。
My only thought is that if this function is only called with C null-terminated strings, you might be requiring the extra construction of a std::string for parameter 's'.
Possibly the 'more' efficient method would be to allow either a std::string or const char *s passed in.
Memcpy 是骗子吗?
编辑
所以我把另外一个去掉了。这不是作弊。
Memcpy is a cheat?
EDIT
So I removed the other one. This is not a cheat.
如果您必须通过测试,那么您将不得不编写低效的代码,因为您必须返回字符串的副本。这意味着您必须使用动态分配,并且由于副本的原因可能需要多次使用。
因此,更改测试并更改签名。
然后像这样调用它:
这允许您使用动态编程,并且消除了在函数中动态分配的需要。
我喜欢我的算法简约,并且我故意取消了对
n
小于或等于字符串大小的检查。我用该函数的先决条件替换了检查。前提条件比检查更快:它们的开销为零。If you must pass the tests then you're going to have to write inefficient code, because you must return a copy of a string. This implies you must use dynamic allocation, possibly multiple times because of the copy.
So change the tests and change the signature.
Then call it like so:
This allows you to use dynamic programming, and it eliminates the need of dynamic allocation in the function.
I like my algorithms minimalistic, and I purposely eliminated the check for
n
being smaller or equal to the size of the string. I replace the check with a pre-condition for the function. Preconditions are faster than checks: they have zero overhead.