C++字符串面试问题

发布于 2024-10-08 17:18:03 字数 1544 浏览 2 评论 0原文

我最近参加了一次 C++ 技术面试,面试中给了我一些简单的字符串操作代码,该代码的目的是获取一个字符串并返回一个由第一个和最后一个 n 个字符组成的字符串,然后继续纠正任何bug 并让函数尽可能高效,我想出了下面的解决方案,但是面试官声称有一种更快更优化的方法:

原始代码:

std::string first_last_n(int n, std::string s)
{
   std::string first_n = s.substr(0,n);
   std::string last_n = s.substr(s.size()-n-1,n);
   return first_n + last_n;
}

我的代码:

bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
   if (s.size() < n)
      return false;
   r.reserve(2 * n);
   r.resize(0);
   r.append(s.data(),s.data() + n);
   r.append(s.data() + (s.size() - n), s.data() + s.size());
   return true;
}

我的更改摘要:

  • 更改了接口以返回字符串作为参考(假设 RVO 和右值尚不可用)

  • 删除了通过构造的临时字符串substr

  • 将输入字符串作为常量引用传递,以便绕过输入的临时实例化

  • < p>修复了last_n字符串中的off-by-1错误

  • 将每个字符被触碰的次数减少到一次或两次(在发生重叠场景)

  • 在字符串 s 的大小小于 n 的情况下进行检查,返回false 表示失败。

假设只允许使用本机 C++,是否有其他方法可以更有效或更优化地执行上述操作?

注1:原始输入字符串实例不可修改。

注2:所有解决方案必须通过以下测试用例,否则无效。

void test()
{
   {
      std::string s = "0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "0123456789ABC0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "1234321";
      std::string r = first_last_n(5,s);
      assert(r == "1234334321");
   }

}

I was recently in a C++ technical interview, where I was given a bit of simple string manipulation code, which is intended to take a string and return a string that is comprised of the first and last n-characters, and then proceed to correct any bugs and to also make the function as efficient as possible, I came up with the solution below, however the interviewer claimed there was an even faster more optimal way:

Original code:

std::string first_last_n(int n, std::string s)
{
   std::string first_n = s.substr(0,n);
   std::string last_n = s.substr(s.size()-n-1,n);
   return first_n + last_n;
}

My code:

bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
   if (s.size() < n)
      return false;
   r.reserve(2 * n);
   r.resize(0);
   r.append(s.data(),s.data() + n);
   r.append(s.data() + (s.size() - n), s.data() + s.size());
   return true;
}

Summary of my changes:

  • Changed the interface to take a return string as reference (assuming RVO and rvalues are not available yet)

  • Removed temporary strings being constructed via substr

  • Passed input string as a const reference inorder to get around temporary instantiation of input

  • Fixed off-by-1 error in last_n string

  • Reduced the number of times each character is touched down to once, or twice (in the event of an overlapping scenario)

  • Placed a check in the event the string s's size is less than n, returning false for failure.

Assuming only native C++ is allowed, is there some other way of doing the above more efficiently or optimally?

Note 1: The original input string instance is not to be modified.

Note 2: All solutions must pass the following test case, otherwise they are not valid.

void test()
{
   {
      std::string s = "0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "0123456789ABC0123456789";
      std::string r = first_last_n(10,s);
      assert(r == "01234567890123456789");
   }

   {
      std::string s = "1234321";
      std::string r = first_last_n(5,s);
      assert(r == "1234334321");
   }

}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

陈甜 2024-10-15 17:18:03

此实现应该很快:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
    n = std::min(n, s.size());
    std::string ret;
    ret.reserve(2*n);
    ret.append(s.begin(), s.begin() + n);
    ret.append(s.end() - n, s.end());
    return ret;
}

通过了所有三个单元测试

当使用 GNU libstdc++ 时,声明 & 的行初始化 ret 的速度非常快,因为 libstdc++ 使用全局“空字符串”变量。因此,它只是一个指针副本。对 sbeginend 调用也很快,因为它们将解析为 begin 和 < 的 const 版本code>end、begin() constend() const,因此 s 的内部表示不会“泄露” 。对于 libstdc++,std::string::const_iteratorconst char*,它是一个指针类型和随机访问迭代器。因此,当 std::string::append(const char*, const char*) 调用 std::distance 获取输入范围的长度,是一个指针差分操作。此外,std::string::append(const char*, const char*) 会产生类似于 memmove 的结果。最后,reserve 操作确保有足够的内存可用于返回值。

编辑:
出于好奇,这里是 MinGW g++ 4.5.0 的汇编输出中 ret 的初始化:

    movl    $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)

它只是将指针复制到全局“空表示”。

编辑2:
好的。我现在已经使用 g++ 4.5.0 和 Visual C++ 16.00.30319.01 测试了四种变体:

变体 1(“c_str 变体”):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
   ret.append(s_cStr, s_cStr + n);
   ret.append(s_cStr_end - n, s_cStr_end);
   return ret;
}

变体 2(“数据字符串”变体):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_data = s.data(), *s_data_end = s_data + s_size;
   ret.append(s_data, s_data + n);
   ret.append(s_data_end - n, s_data_end);
   return ret;
}

变体 3:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret(s);
   std::string::size_type d = s_size - n;
   return ret.replace(n, d, s, d, n);
}

变体 4(我的原始代码):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   ret.append(s.begin(), s.begin() + n);
   ret.append(s.end() - n, s.end());
   return ret;
}

g++ 4.5.0 的结果是:

  • 变体 4 最快
  • 变体 3 第二(比变体 4 慢 5%)
  • 变体 1 第三(比变体 3 慢 2%)
  • 变体 2 第四(比变体 1 慢 0.2%)

VC++ 16.00.30319.01 的结果是:

  • 变体 1 最快
  • 变体 2 第二(比变体 1 慢 3%)
  • 变体 4 第三(比变体 2 慢 4%)
  • 变体 3 第四(比变体 4 慢 17%)

毫不奇怪,最快的变体取决于编译器。然而,不知道将使用哪个编译器,我认为我的变体是最好的,因为它是熟悉的 C++ 风格,使用 g++ 时它是最快的,并且使用 VC++ 时它并不比变体 1 或 2 慢很多。

VC++ 结果中有趣的一件事是使用 c_str 而不是 data 更快。也许这就是为什么你的面试官说有比你的实施更快的方法。

EDIT3:

实际上,我只是想到了另一种变体:

变体 5:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   std::string::const_iterator s_begin = s.begin(), s_end = s.end();
   ret.append(s_begin, s_begin + n);
   ret.append(s_end - n, s_end);
   return ret;
}

它就像变体 4 一样,只是保存了 s 的开始和结束迭代器。

当测试变体 5 时,在使用 VC++ 时,它实际上击败了变体 2(数据字符串变体):

  • 变体 1 最快,
  • 变体 5 次之(比变体 1 慢 1.6%),
  • 变体 2 第三(比变体 5 慢 1.4%) )
  • 变体 4 排名第三(比变体 2 慢 4%)
  • 变体 3 排名第四(比变体 4 慢 17%)

This implementation should be fast:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
    n = std::min(n, s.size());
    std::string ret;
    ret.reserve(2*n);
    ret.append(s.begin(), s.begin() + n);
    ret.append(s.end() - n, s.end());
    return ret;
}

It passes all three unit tests.

When using GNU libstdc++, the line that declares & initializes ret is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls to begin and end on s are also fast because they will resolve to the const versions of begin and end, begin() const and end() const, so the internal representation of s is not "leaked". With libstdc++, std::string::const_iterator is const char*, which is a pointer type and random access iterator. Thus, when std::string::append<const char*>(const char*, const char*) calls std::distance to obtain the length of the input range, it is a pointer difference operation. Also, std::string::append<const char*>(const char*, const char*) results in something like a memmove. Finally, the reserve operation ensures that enough memory is available for the return value.

EDIT:
For the curious, here is the initialization of ret in the assembly output of MinGW g++ 4.5.0:

    movl    $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)

It's simply copying the pointer to the global "empty representation".

EDIT2:
Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:

Variant 1 (the "c_str variant"):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
   ret.append(s_cStr, s_cStr + n);
   ret.append(s_cStr_end - n, s_cStr_end);
   return ret;
}

Variant 2 (the "data string" variant):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret;
   ret.reserve(2*n);
   const char *s_data = s.data(), *s_data_end = s_data + s_size;
   ret.append(s_data, s_data + n);
   ret.append(s_data_end - n, s_data_end);
   return ret;
}

Variant 3:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   std::string::size_type s_size = s.size();
   n = std::min(n, s_size);
   std::string ret(s);
   std::string::size_type d = s_size - n;
   return ret.replace(n, d, s, d, n);
}

Variant 4 (my original code):

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   ret.append(s.begin(), s.begin() + n);
   ret.append(s.end() - n, s.end());
   return ret;
}

The results for g++ 4.5.0 are:

  • Variant 4 is the fastest
  • Variant 3 is second (5% slower than variant 4)
  • Variant 1 is third (2% slower than variant 3)
  • Variant 2 is fourth (0.2% slower than variant 1)

The results for VC++ 16.00.30319.01 are:

  • Variant 1 is the fastest
  • Variant 2 is second (3% slower than variant 1)
  • Variant 4 is third (4% slower than variant 2)
  • Variant 3 is fourth (17% slower than variant 4)

Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.

One thing interesting from the VC++ results is that using c_str rather than data is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.

EDIT3:

Actually, I just thought about another variant:

Variant 5:

inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
   n = std::min(n, s.size());
   std::string ret;
   ret.reserve(2*n);
   std::string::const_iterator s_begin = s.begin(), s_end = s.end();
   ret.append(s_begin, s_begin + n);
   ret.append(s_end - n, s_end);
   return ret;
}

It's just like variant 4 except that the begin and end iterators for s are saved.

When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++:

  • Variant 1 is the fastest
  • Variant 5 is second (1.6% slower than variant 1)
  • Variant 2 is third (1.4% slower than variant 5)
  • Variant 4 is third (4% slower than variant 2)
  • Variant 3 is fourth (17% slower than variant 4)
初吻给了烟 2024-10-15 17:18:03

如果不需要保留原字符串的内容,则可以将最后n个字符复制到原字符串的[n+1, 2n]位置,并在处截断2n。您必须首先小心地扩展字符串,并且如果字符串短于 2n,则在写入之前要注意不要覆盖任何字符。

这将使构造字符串的操作数量减半,并且无需创建新字符串。所以理论上它的速度要快 2 到 4 倍。当然,你刚刚破坏了原始字符串,你必须询问面试官是否可以接受。

If you don't need to maintain the contents of the original string, then you can copy the last n characters into positions [n+1, 2n] of the original string and truncate it at 2n. You will have to be careful to first expand the string and also be careful not to overwrite any characters before writing to them if the string is shorter than 2n.

This will halve the number of operations to construct the string, as well as remove the need to create a new string. So its theoretically between 2 and 4 times faster. But of course you have just destroyed the original string, which you'd have to ask the interviewer if it is acceptable.

无名指的心愿 2024-10-15 17:18:03

删除中间的 N-2n 个字符怎么样,其中 N 是源字符串的长度?

How about removing the middle N-2n characters, where N is the length of the source string?

寄居者 2024-10-15 17:18:03
// compiled with cl /Ox first_last_n.cpp /W4 /EHsc

inline void
first_last_n2(string::size_type n, const std::string &s, string &out)  // method 2
{
  // check against degenerate input
  assert(n > 0);
  assert(n <= s.size());

  out.reserve(2*n);
  out.assign(s, 0, n);
  out.append(s, s.size()-n, n);
}

Times:

method 1:  // original method
2.281
method 2:  // my method
0.687
method 3:  // your code.
0.782

注意:Timing 专门测试“长”字符串。即那些不使用短字符串优化的情况。 (我的琴弦长度为 100)。

// compiled with cl /Ox first_last_n.cpp /W4 /EHsc

inline void
first_last_n2(string::size_type n, const std::string &s, string &out)  // method 2
{
  // check against degenerate input
  assert(n > 0);
  assert(n <= s.size());

  out.reserve(2*n);
  out.assign(s, 0, n);
  out.append(s, s.size()-n, n);
}

Times:

method 1:  // original method
2.281
method 2:  // my method
0.687
method 3:  // your code.
0.782

Note: Timing specifically tests "long" strings. I.e. those where short string optimization is not used. (My strings were 100 length).

孤蝉 2024-10-15 17:18:03

我唯一的想法是,如果仅使用 C 空终止字符串调用此函数,则可能需要为参数 's' 额外构造 std::string 。

可能“更”有效的方法是允许传入 std::string 或 const char *s。

My only thought is that if this function is only called with C null-terminated strings, you might be requiring the extra construction of a std::string for parameter 's'.

Possibly the 'more' efficient method would be to allow either a std::string or const char *s passed in.

月牙弯弯 2024-10-15 17:18:03

Memcpy 是骗子吗?

#include <cstring>
#include <iostream>
#include <string>

std::string first_last_n(int n, const std::string& s)
{
  if (s.size() < n)
      return "";

    char str[n*2];
    memcpy(str, s.data(), n);
    memcpy(str+n, s.data() + s.size()-n, n);

    return (const char *)str;
}

int main()
{
    std::cout << first_last_n(2, "123454321") << std::endl;
}

编辑
所以我把另外一个去掉了。这不是作弊。

Memcpy is a cheat?

#include <cstring>
#include <iostream>
#include <string>

std::string first_last_n(int n, const std::string& s)
{
  if (s.size() < n)
      return "";

    char str[n*2];
    memcpy(str, s.data(), n);
    memcpy(str+n, s.data() + s.size()-n, n);

    return (const char *)str;
}

int main()
{
    std::cout << first_last_n(2, "123454321") << std::endl;
}

EDIT
So I removed the other one. This is not a cheat.

落在眉间の轻吻 2024-10-15 17:18:03

如果您必须通过测试,那么您将不得不编写低效的代码,因为您必须返回字符串的副本。这意味着您必须使用动态分配,并且由于副本的原因可能需要多次使用。

因此,更改测试并更改签名。

template<class Out>
Out first_last_n(const std::string::size_type& n, const std::string& s, Out r)
{
    r = copy_n(s.begin(), n, r);
    std::string::const_iterator pos(s.end());
    std::advance(pos, -n);
    return copy_n(pos, n, r);
}

然后像这样调用它:

std::string s("Hello world!");
char r[5];
r[4] = 0;
first_last_n(2, s, r);

这允许您使用动态编程,并且消除了在函数中动态分配的需要。

我喜欢我的算法简约,并且我故意取消了对 n 小于或等于字符串大小的检查。我用该函数的先决条件替换了检查。前提条件比检查更快:它们的开销为零。

If you must pass the tests then you're going to have to write inefficient code, because you must return a copy of a string. This implies you must use dynamic allocation, possibly multiple times because of the copy.

So change the tests and change the signature.

template<class Out>
Out first_last_n(const std::string::size_type& n, const std::string& s, Out r)
{
    r = copy_n(s.begin(), n, r);
    std::string::const_iterator pos(s.end());
    std::advance(pos, -n);
    return copy_n(pos, n, r);
}

Then call it like so:

std::string s("Hello world!");
char r[5];
r[4] = 0;
first_last_n(2, s, r);

This allows you to use dynamic programming, and it eliminates the need of dynamic allocation in the function.

I like my algorithms minimalistic, and I purposely eliminated the check for n being smaller or equal to the size of the string. I replace the check with a pre-condition for the function. Preconditions are faster than checks: they have zero overhead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文