如何有效地从 std::string 中删除双引号(如果存在)

发布于 2024-12-04 01:30:03 字数 749 浏览 0 评论 0原文

这个问题有重复的风险,例如 remove doublequotes from a string in c++< /a> 但我看到的答案都没有解决我的问题
我有一个字符串列表,其中一些是双引号的,有些不是,引号总是在开头和结尾

std::vector<std::string> words = boost::assign::list_of("words")( "\"some\"")( "of which")( "\"might\"")("be quoted");

我正在寻找删除引号的最有效方法。这是我的尝试

for(std::vector<std::string>::iterator pos = words.begin(); pos != words.end(); ++pos)
{
  boost::algorithm::replace_first(*pos, "\"", "");
  boost::algorithm::replace_last(*pos, "\"", "");
  cout << *pos << endl;
}

我可以做得更好吗?我可能有数十万个字符串需要处理。它们可能来自文件或数据库。示例中的 std::vector 仅用于说明目的。

This question risks being a duplicate e.g. remove double quotes from a string in c++
but none of the asnwers that I saw addresses my question
I have a list of strings, some of which are double quoted and some aren't, Quotes are always at beginning and end

std::vector<std::string> words = boost::assign::list_of("words")( "\"some\"")( "of which")( "\"might\"")("be quoted");

I am looking for the most efficient way to remove the quotes. Here is my attempt

for(std::vector<std::string>::iterator pos = words.begin(); pos != words.end(); ++pos)
{
  boost::algorithm::replace_first(*pos, "\"", "");
  boost::algorithm::replace_last(*pos, "\"", "");
  cout << *pos << endl;
}

Can I do better than this? I have potentially hundreds of thousands of string to process.They may come from a file or from a database. The std::vector in the example is just for illustration purposes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

旧时浪漫 2024-12-11 01:30:03

如果您知道引号将始终出现在第一个和最后一个位置,您可以简单地执行

if ( s.front() == '"' ) {
    s.erase( 0, 1 ); // erase the first character
    s.erase( s.size() - 1 ); // erase the last character
}

复杂性仍然与字符串的大小成线性关系。您无法在 O(1) 时间内从 std::string 的开头插入或删除。如果可以接受用空格替换该字符,那么就这样做。

If you know the quotes will always appear in the first and last positions, you can do simply

if ( s.front() == '"' ) {
    s.erase( 0, 1 ); // erase the first character
    s.erase( s.size() - 1 ); // erase the last character
}

The complexity is still linear in the size of the string. You cannot insert or remove from the beginning of a std::string in O(1) time. If it is acceptable to replace the character with a space, then do that.

情话难免假 2024-12-11 01:30:03

进行检查可能会很快:

for (auto i = words.begin(); i != words.end(); ++i)
    if (*(i->begin()) == '"')
        if (*(i->rbegin()) == '"')
            *i = i->substr(1, i->length() - 2);
        else
            *i = i->substr(1, i->length() - 1);
    else if (*(i->rbegin()) == '"')
        *i = i->substr(0, i->length() - 1);

它可能不是有史以来最漂亮的东西,但它是 O(n) 且有一个小常数。

It would probably be fast to do a check:

for (auto i = words.begin(); i != words.end(); ++i)
    if (*(i->begin()) == '"')
        if (*(i->rbegin()) == '"')
            *i = i->substr(1, i->length() - 2);
        else
            *i = i->substr(1, i->length() - 1);
    else if (*(i->rbegin()) == '"')
        *i = i->substr(0, i->length() - 1);

It might not be the prettiest thing ever, but it's O(n) with a small constant.

檐上三寸雪 2024-12-11 01:30:03

现代 C++ 最有效的方法是:

  if (str.size() > 1) {
    if (str.front() == '"' && str.back() == '"') {
      if (str.size() == 2) {
        str.erase();
      } else {
        str.erase(str.begin());
        str.erase(str.end() - 1);
      }
    }
  }

基本原理:

  • erase() 函数修改字符串而不是重新分配它。
  • 对空字符串调用 front() 会触发未定义的行为。
  • 此代码有可能编译器推断出两个擦除调用的意图并进一步优化代码(一起删除第一个和最后一个字符是一个标准问题)。

The most efficient way for modern C++ is:

  if (str.size() > 1) {
    if (str.front() == '"' && str.back() == '"') {
      if (str.size() == 2) {
        str.erase();
      } else {
        str.erase(str.begin());
        str.erase(str.end() - 1);
      }
    }
  }

Rationale:

  • The erase() function modifies the string instead of reallocating it.
  • Calling front() on empty strings triggers undefined behavior.
  • This code is open to the possibility that the compiler deduces the intention of the two erase calls and optimize the code further (removing the first and last char together is a standard problem).
飘落散花 2024-12-11 01:30:03

这就是我处理这种情况的方法:

  • 从简单开始:从最简单的方法开始,就像 Potatoswatter 的回答一样。
  • 不要存储带引号的字符串:如果可以的话,根本不要存储带引号的字符串。首先在创建 std::vector 时检查并取消引用字符串。如果您只是接收一个 std::vector ,那么您无能为力,因为删除第一个引号将需要复制字符串的其余部分。
  • 配置文件/基准:您可能会惊讶地发现,几个 100000 个字符串的迭代速度有多快,而任何数量的微优化最终却给您带来的好处却是那么少。总会有一些情况,您确实需要一点点速度,但请确保了解如何实现最大收益(分析会告诉您)。
  • 最坏情况:如果您绝对必须防止在取消引用时复制整个字符串,则将索引/迭代器存储到第一个“真实”字符。对于“短”字符串,这实际上可能会更慢,但对于“长”字符串(即兆字节大小)可能有效。您还可以创建或查找一个字符串类,它可以在不复制的情况下处理移动字符串开头,但这将是我的最后选择。

This is how I would approach the situation:

  • Start Simple: Begin with the simplest approach that does the job, like Potatoswatter's answer.
  • Don't Store Quoted Strings: If you can help it, don't store quoted strings at all. Check and unquote strings where ever you are creating the std::vector<std::string> in the first place. If you are simply receiving a std::vector<std::string> there isn't too much you can do as removing the first quote will require copying the rest of the string.
  • Profile/Benchmark: You may be surprised how fast a few 100000 strings can be iterated through and how little any amount of micro-optimizing will get you in the end. There will always be some cases where you do need every little bit of speed but make sure understand how to achieve the biggest gains (which profiling will tell you).
  • Worst Case: If you absolutely have to prevent copying the entire string when unquoting then store an index/iterator to the first "real" character. This may actually be slower with "short" strings but may work with "long" strings (i.e., megabytes in size). You could also create, or find, a string class that handles moving the string start without copying but this would be my last choice.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文