C++字符串面试问题

发布于 2024-10-08 17:18:03 字数 1544 浏览 2 评论 0原文

我最近参加了一次 C++ 技术面试，面试中给了我一些简单的字符串操作代码，该代码的目的是获取一个字符串并返回一个由第一个和最后一个 n 个字符组成的字符串，然后继续纠正任何bug 并让函数尽可能高效，我想出了下面的解决方案，但是面试官声称有一种更快更优化的方法：

原始代码：

std::string first_last_n(int n, std::string s)
{
   std::string first_n = s.substr(0,n);
   std::string last_n = s.substr(s.size()-n-1,n);
   return first_n + last_n;
}

我的代码：

bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
   if (s.size() < n)
      return false;
   r.reserve(2 * n);
   r.resize(0);
   r.append(s.data(),s.data() + n);
   r.append(s.data() + (s.size() - n), s.data() + s.size());
   return true;
}

我的更改摘要：

更改了接口以返回字符串作为参考（假设 RVO 和右值尚不可用）
删除了通过构造的临时字符串substr
将输入字符串作为常量引用传递，以便绕过输入的临时实例化
< p>修复了last_n字符串中的off-by-1错误
将每个字符被触碰的次数减少到一次或两次（在发生重叠场景）
在字符串 s 的大小小于 n 的情况下进行检查，返回false 表示失败。

假设只允许使用本机 C++，是否有其他方法可以更有效或更优化地执行上述操作？

注1：原始输入字符串实例不可修改。

注2：所有解决方案必须通过以下测试用例，否则无效。

void test()
{
   {
      std::string s = "0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "0123456789ABC0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "1234321";
      std::string r = first_last_n(5,s);
      assert(r == "1234334321");
   }

}

原文

I was recently in a C++ technical interview, where I was given a bit of simple string manipulation code, which is intended to take a string and return a string that is comprised of the first and last n-characters, and then proceed to correct any bugs and to also make the function as efficient as possible, I came up with the solution below, however the interviewer claimed there was an even faster more optimal way:

Original code:

std::string first_last_n(int n, std::string s)
{
   std::string first_n = s.substr(0,n);
   std::string last_n = s.substr(s.size()-n-1,n);
   return first_n + last_n;
}

My code:

bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
   if (s.size() < n)
      return false;
   r.reserve(2 * n);
   r.resize(0);
   r.append(s.data(),s.data() + n);
   r.append(s.data() + (s.size() - n), s.data() + s.size());
   return true;
}

Summary of my changes:

Changed the interface to take a return string as reference (assuming RVO and rvalues are not available yet)
Removed temporary strings being constructed via substr
Passed input string as a const reference inorder to get around temporary instantiation of input
Fixed off-by-1 error in last_n string
Reduced the number of times each character is touched down to once, or twice (in the event of an overlapping scenario)
Placed a check in the event the string s's size is less than n, returning false for failure.

Assuming only native C++ is allowed, is there some other way of doing the above more efficiently or optimally?

Note 1: The original input string instance is not to be modified.

Note 2: All solutions must pass the following test case, otherwise they are not valid.

void test()
{
   {
      std::string s = "0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "0123456789ABC0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "1234321";
      std::string r = first_last_n(5,s);
      assert(r == "1234334321");
   }

}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陈甜 2024-10-15 17:18:03

此实现应该很快：

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
    n = std::min(n, s.size());
    std::string ret;
    ret.reserve(2*n);
    ret.append(s.begin(), s.begin() + n);
    ret.append(s.end() - n, s.end());
    return ret;
}

它通过了所有三个单元测试。

当使用 GNU libstdc++ 时，声明 & 的行初始化 ret 的速度非常快，因为 libstdc++ 使用全局“空字符串”变量。因此，它只是一个指针副本。对 s 的 begin 和 end 调用也很快，因为它们将解析为 begin 和 < 的 const 版本code>end、begin() const 和 end() const，因此 s 的内部表示不会“泄露” 。对于 libstdc++，std::string::const_iterator 是 const char*，它是一个指针类型和随机访问迭代器。因此，当 std::string::append(const char*, const char*) 调用 std::distance 获取输入范围的长度，是一个指针差分操作。此外，std::string::append(const char*, const char*) 会产生类似于 memmove 的结果。最后，reserve 操作确保有足够的内存可用于返回值。

编辑：
出于好奇，这里是 MinGW g++ 4.5.0 的汇编输出中 ret 的初始化：

    movl    $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)

它只是将指针复制到全局“空表示”。

编辑2：
好的。我现在已经使用 g++ 4.5.0 和 Visual C++ 16.00.30319.01 测试了四种变体：

变体 1（“c_str 变体”）：

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
   ret.append(s_cStr, s_cStr + n);
   ret.append(s_cStr_end - n, s_cStr_end);
   return ret;
}

变体 2（“数据字符串”变体）：

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_data = s.data(), *s_data_end = s_data + s_size;
   ret.append(s_data, s_data + n);
   ret.append(s_data_end - n, s_data_end);
   return ret;
}

变体 3：

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret(s);
   std::string::size_type d = s_size - n;
   return ret.replace(n, d, s, d, n);
}

变体 4（我的原始代码）：

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   ret.append(s.begin(), s.begin() + n);
   ret.append(s.end() - n, s.end());
   return ret;
}

g++ 4.5.0 的结果是：

变体 4 最快
变体 3 第二（比变体 4 慢 5%）
变体 1 第三（比变体 3 慢 2%）
变体 2 第四（比变体 1 慢 0.2%）

VC++ 16.00.30319.01 的结果是：

变体 1 最快
变体 2 第二（比变体 1 慢 3%）
变体 4 第三（比变体 2 慢 4%）
变体 3 第四（比变体 4 慢 17%）

毫不奇怪，最快的变体取决于编译器。然而，不知道将使用哪个编译器，我认为我的变体是最好的，因为它是熟悉的 C++ 风格，使用 g++ 时它是最快的，并且使用 VC++ 时它并不比变体 1 或 2 慢很多。

VC++ 结果中有趣的一件事是使用 c_str 而不是 data 更快。也许这就是为什么你的面试官说有比你的实施更快的方法。

EDIT3：

实际上，我只是想到了另一种变体：

变体 5：

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   std::string::const_iterator s_begin = s.begin(), s_end = s.end();
   ret.append(s_begin, s_begin + n);
   ret.append(s_end - n, s_end);
   return ret;
}

它就像变体 4 一样，只是保存了 s 的开始和结束迭代器。

当测试变体 5 时，在使用 VC++ 时，它实际上击败了变体 2（数据字符串变体）：

变体 1 最快，
变体 5 次之（比变体 1 慢 1.6%），
变体 2 第三（比变体 5 慢 1.4%） )
变体 4 排名第三（比变体 2 慢 4%）
变体 3 排名第四（比变体 4 慢 17%）

This implementation should be fast:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
    n = std::min(n, s.size());
    std::string ret;
    ret.reserve(2*n);
    ret.append(s.begin(), s.begin() + n);
    ret.append(s.end() - n, s.end());
    return ret;
}

It passes all three unit tests.

When using GNU libstdc++, the line that declares & initializes ret is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls to begin and end on s are also fast because they will resolve to the const versions of begin and end, begin() const and end() const, so the internal representation of s is not "leaked". With libstdc++, std::string::const_iterator is const char*, which is a pointer type and random access iterator. Thus, when std::string::append<const char*>(const char*, const char*) calls std::distance to obtain the length of the input range, it is a pointer difference operation. Also, std::string::append<const char*>(const char*, const char*) results in something like a memmove. Finally, the reserve operation ensures that enough memory is available for the return value.

EDIT:
For the curious, here is the initialization of ret in the assembly output of MinGW g++ 4.5.0:

    movl    $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)

It's simply copying the pointer to the global "empty representation".

EDIT2:
Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:

Variant 1 (the "c_str variant"):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
   ret.append(s_cStr, s_cStr + n);
   ret.append(s_cStr_end - n, s_cStr_end);
   return ret;
}

Variant 2 (the "data string" variant):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_data = s.data(), *s_data_end = s_data + s_size;
   ret.append(s_data, s_data + n);
   ret.append(s_data_end - n, s_data_end);
   return ret;
}

Variant 3:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret(s);
   std::string::size_type d = s_size - n;
   return ret.replace(n, d, s, d, n);
}

Variant 4 (my original code):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   ret.append(s.begin(), s.begin() + n);
   ret.append(s.end() - n, s.end());
   return ret;
}

The results for g++ 4.5.0 are:

Variant 4 is the fastest
Variant 3 is second (5% slower than variant 4)
Variant 1 is third (2% slower than variant 3)
Variant 2 is fourth (0.2% slower than variant 1)

The results for VC++ 16.00.30319.01 are:

Variant 1 is the fastest
Variant 2 is second (3% slower than variant 1)
Variant 4 is third (4% slower than variant 2)
Variant 3 is fourth (17% slower than variant 4)

Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.

One thing interesting from the VC++ results is that using c_str rather than data is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.

EDIT3:

Actually, I just thought about another variant:

Variant 5:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   std::string::const_iterator s_begin = s.begin(), s_end = s.end();
   ret.append(s_begin, s_begin + n);
   ret.append(s_end - n, s_end);
   return ret;
}

It's just like variant 4 except that the begin and end iterators for s are saved.

When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++:

Variant 1 is the fastest
Variant 5 is second (1.6% slower than variant 1)
Variant 2 is third (1.4% slower than variant 5)
Variant 4 is third (4% slower than variant 2)
Variant 3 is fourth (17% slower than variant 4)

回复收藏 0 原文

初吻给了烟 2024-10-15 17:18:03

如果不需要保留原字符串的内容，则可以将最后n个字符复制到原字符串的[n+1, 2n]位置，并在处截断2n。您必须首先小心地扩展字符串，并且如果字符串短于 2n，则在写入之前要注意不要覆盖任何字符。

这将使构造字符串的操作数量减半，并且无需创建新字符串。所以理论上它的速度要快 2 到 4 倍。当然，你刚刚破坏了原始字符串，你必须询问面试官是否可以接受。

回复收藏 0 原文

无名指的心愿 2024-10-15 17:18:03

删除中间的 N-2n 个字符怎么样，其中 N 是源字符串的长度？

回复收藏 0 原文

寄居者 2024-10-15 17:18:03

// compiled with cl /Ox first_last_n.cpp /W4 /EHsc

inline void
first_last_n2(string::size_type n, const std::string &s, string &out)  // method 2
{
  // check against degenerate input
  assert(n > 0);
  assert(n <= s.size());

  out.reserve(2*n);
  out.assign(s, 0, n);
  out.append(s, s.size()-n, n);
}

Times：

method 1:  // original method
2.281
method 2:  // my method
0.687
method 3:  // your code.
0.782

注意：Timing 专门测试“长”字符串。即那些不使用短字符串优化的情况。（我的琴弦长度为 100）。

// compiled with cl /Ox first_last_n.cpp /W4 /EHsc

inline void
first_last_n2(string::size_type n, const std::string &s, string &out)  // method 2
{
  // check against degenerate input
  assert(n > 0);
  assert(n <= s.size());

  out.reserve(2*n);
  out.assign(s, 0, n);
  out.append(s, s.size()-n, n);
}

Times:

method 1:  // original method
2.281
method 2:  // my method
0.687
method 3:  // your code.
0.782

Note: Timing specifically tests "long" strings. I.e. those where short string optimization is not used. (My strings were 100 length).

回复收藏 0 原文

孤蝉 2024-10-15 17:18:03

我唯一的想法是，如果仅使用 C 空终止字符串调用此函数，则可能需要为参数 's' 额外构造 std::string 。

可能“更”有效的方法是允许传入 std::string 或 const char *s。

回复收藏 0 原文

月牙弯弯 2024-10-15 17:18:03

Memcpy 是骗子吗？

#include <cstring>
#include <iostream>
#include <string>

std::string first_last_n(int n, const std::string& s)
{
  if (s.size() < n)
      return "";

    char str[n*2];
    memcpy(str, s.data(), n);
    memcpy(str+n, s.data() + s.size()-n, n);

    return (const char *)str;
}

int main()
{
    std::cout << first_last_n(2, "123454321") << std::endl;
}

编辑
所以我把另外一个去掉了。这不是作弊。

Memcpy is a cheat?

#include <cstring>
#include <iostream>
#include <string>

std::string first_last_n(int n, const std::string& s)
{
  if (s.size() < n)
      return "";

    char str[n*2];
    memcpy(str, s.data(), n);
    memcpy(str+n, s.data() + s.size()-n, n);

    return (const char *)str;
}

int main()
{
    std::cout << first_last_n(2, "123454321") << std::endl;
}

EDIT
So I removed the other one. This is not a cheat.

回复收藏 0 原文

落在眉间の轻吻 2024-10-15 17:18:03

如果您必须通过测试，那么您将不得不编写低效的代码，因为您必须返回字符串的副本。这意味着您必须使用动态分配，并且由于副本的原因可能需要多次使用。

因此，更改测试并更改签名。

template<class Out>
Out first_last_n(const std::string::size_type& n, const std::string& s, Out r)
{
    r = copy_n(s.begin(), n, r);
    std::string::const_iterator pos(s.end());
    std::advance(pos, -n);
    return copy_n(pos, n, r);
}

然后像这样调用它：

std::string s("Hello world!");
char r[5];
r[4] = 0;
first_last_n(2, s, r);

这允许您使用动态编程，并且消除了在函数中动态分配的需要。

我喜欢我的算法简约，并且我故意取消了对 n 小于或等于字符串大小的检查。我用该函数的先决条件替换了检查。前提条件比检查更快：它们的开销为零。

If you must pass the tests then you're going to have to write inefficient code, because you must return a copy of a string. This implies you must use dynamic allocation, possibly multiple times because of the copy.

So change the tests and change the signature.

template<class Out>
Out first_last_n(const std::string::size_type& n, const std::string& s, Out r)
{
    r = copy_n(s.begin(), n, r);
    std::string::const_iterator pos(s.end());
    std::advance(pos, -n);
    return copy_n(pos, n, r);
}

Then call it like so:

std::string s("Hello world!");
char r[5];
r[4] = 0;
first_last_n(2, s, r);

This allows you to use dynamic programming, and it eliminates the need of dynamic allocation in the function.

I like my algorithms minimalistic, and I purposely eliminated the check for n being smaller or equal to the size of the string. I replace the check with a pre-condition for the function. Preconditions are faster than checks: they have zero overhead.

回复收藏 0 原文

~没有更多了~