在 C++ 中对字符串进行标记并包含分隔符

发布于 2024-08-07 10:00:21 字数 563 浏览 10 评论 0原文

我正在标记以下内容，但不确定如何包含分隔符。

void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{

    int startpos = 0;
    int pos = str.find_first_of(delimiters, startpos);
    string strTemp;


    while (string::npos != pos || string::npos != startpos)
    {

        strTemp = str.substr(startpos, pos - startpos);
        tokens.push_back(strTemp.substr(0, strTemp.length()));

        startpos = str.find_first_not_of(delimiters, pos);
        pos = str.find_first_of(delimiters, startpos);

    }
}

原文

I'm tokening with the following, but unsure how to include the delimiters with it.

void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{

    int startpos = 0;
    int pos = str.find_first_of(delimiters, startpos);
    string strTemp;


    while (string::npos != pos || string::npos != startpos)
    {

        strTemp = str.substr(startpos, pos - startpos);
        tokens.push_back(strTemp.substr(0, strTemp.length()));

        startpos = str.find_first_not_of(delimiters, pos);
        pos = str.find_first_of(delimiters, startpos);

    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

想念有你 2024-08-14 10:00:21

C++ String Toolkit Library (StrTk) 具有以下解决方案：

std::string str = "abc,123 xyz";
std::vector<std::string> token_list;
strtk::split(";., ",
             str,
             strtk::range_to_type_back_inserter(token_list),
             strtk::include_delimiters);

它应该导致token_list 具有以下元素：

Token₀ = "abc,"
Token₁ = "123 "
Token₂ = "xyz"

更多示例可以在此处找到

The C++ String Toolkit Library (StrTk) has the following solution:

std::string str = "abc,123 xyz";
std::vector<std::string> token_list;
strtk::split(";., ",
             str,
             strtk::range_to_type_back_inserter(token_list),
             strtk::include_delimiters);

It should result with token_list have the following elements:

Token₀ = "abc,"
Token₁ = "123 "
Token₂ = "xyz"

More examples can be found Here

回复收藏 0 原文

椒妓 2024-08-14 10:00:21

我现在有点草率，但这就是我最终的结果。我不想使用 boost，因为这是一项学校作业，并且我的老师希望我使用 find_first_of 来完成此任务。

感谢大家的帮助。

vector<string> Tokenize(const string& strInput, const string& strDelims)
{
 vector<string> vS;

 string strOne = strInput;
 string delimiters = strDelims;

 int startpos = 0;
 int pos = strOne.find_first_of(delimiters, startpos);

 while (string::npos != pos || string::npos != startpos)
 {
  if(strOne.substr(startpos, pos - startpos) != "")
   vS.push_back(strOne.substr(startpos, pos - startpos));

  // if delimiter is a new line (\n) then addt new line
  if(strOne.substr(pos, 1) == "\n")
   vS.push_back("\\n");
  // else if the delimiter is not a space
  else if (strOne.substr(pos, 1) != " ")
   vS.push_back(strOne.substr(pos, 1));

  if( string::npos == strOne.find_first_not_of(delimiters, pos) )
   startpos = strOne.find_first_not_of(delimiters, pos);
  else
   startpos = pos + 1;

        pos = strOne.find_first_of(delimiters, startpos);

 }

 return vS;
}

I now this a little sloppy, but this is what I ended up with. I did not want to use boost since this is a school assignment and my instructor wanted me to use find_first_of to accomplish this.

Thanks for everyone's help.

vector<string> Tokenize(const string& strInput, const string& strDelims)
{
 vector<string> vS;

 string strOne = strInput;
 string delimiters = strDelims;

 int startpos = 0;
 int pos = strOne.find_first_of(delimiters, startpos);

 while (string::npos != pos || string::npos != startpos)
 {
  if(strOne.substr(startpos, pos - startpos) != "")
   vS.push_back(strOne.substr(startpos, pos - startpos));

  // if delimiter is a new line (\n) then addt new line
  if(strOne.substr(pos, 1) == "\n")
   vS.push_back("\\n");
  // else if the delimiter is not a space
  else if (strOne.substr(pos, 1) != " ")
   vS.push_back(strOne.substr(pos, 1));

  if( string::npos == strOne.find_first_not_of(delimiters, pos) )
   startpos = strOne.find_first_not_of(delimiters, pos);
  else
   startpos = pos + 1;

        pos = strOne.find_first_of(delimiters, startpos);

 }

 return vS;
}

回复收藏 0 原文

毁虫ゝ 2024-08-14 10:00:21

我真的无法遵循你的代码，你能发布一个工作程序吗？

无论如何，这是一个简单的分词器，无需测试边缘情况：

#include <iostream>
#include <string>
#include <vector>

using namespace std;

void tokenize(vector<string>& tokens, const string& text, const string& del)
{
    string::size_type startpos = 0,
        currentpos = text.find(del, startpos);

    do
    {
        tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));

        startpos = currentpos + del.size();
        currentpos = text.find(del, startpos);
    } while(currentpos != string::npos);

    tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
}

示例输入，分隔符= $$：

Hello$Stack$Over$$Flow$$!

令牌：

Hello$
Stack$
Over$
$Flow$
$
!

注意：我永远不会使用未经测试编写的分词器！请使用 boost::tokenizer！

I can't really follow your code, could you post a working program?

Anyway, this is a simple tokenizer, without testing edge cases:

#include <iostream>
#include <string>
#include <vector>

using namespace std;

void tokenize(vector<string>& tokens, const string& text, const string& del)
{
    string::size_type startpos = 0,
        currentpos = text.find(del, startpos);

    do
    {
        tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));

        startpos = currentpos + del.size();
        currentpos = text.find(del, startpos);
    } while(currentpos != string::npos);

    tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
}

Example input, delimiter = $$:

Hello$Stack$Over$$Flow$$!

Tokens:

Hello$
Stack$
Over$
$Flow$
$
!

Note: I would never use a tokenizer I wrote without testing! please use boost::tokenizer!

回复收藏 0 原文

洛阳烟雨空心柳 2024-08-14 10:00:21

如果分隔符是字符而不是字符串，则可以使用 strtok。

回复收藏 0 原文

清醇 2024-08-14 10:00:21

这取决于您是否需要前面的分隔符、后面的分隔符或两者，以及您想要对字符串开头和结尾处的字符串（前后可能没有分隔符）执行的操作。

我假设您想要每个单词及其前面和后面的分隔符，但不需要任何分隔符字符串本身（例如，如果最后一个字符串后面有分隔符）。

template <class iter>
void tokenize(std::string const &str, std::string const &delims, iter out) { 
    int pos = 0;
    do { 
        int beg_word = str.find_first_not_of(delims, pos);
        if (beg_word == std::string::npos) 
            break;
        int end_word = str.find_first_of(delims, beg_word);
        int beg_next_word = str.find_first_not_of(delims, end_word);
        *out++ = std::string(str, pos, beg_next_word-pos);
        pos = end_word;
    } while (pos != std::string::npos);
}

目前，我将其编写得更像是 STL 算法，采用迭代器作为其输出，而不是假设它始终推送到集合上。由于它（目前）依赖于输入是字符串，因此它不使用迭代器作为输入。

It depends on whether you want the preceding delimiters, the following delimiters, or both, and what you want to do with strings at the beginning and end of the string that may not have delimiters before/after them.

I'm going to assume you want each word, with its preceding and following delimiters, but NOT any strings of delimiters by themselves (e.g. if there's a delimiter following the last string).

template <class iter>
void tokenize(std::string const &str, std::string const &delims, iter out) { 
    int pos = 0;
    do { 
        int beg_word = str.find_first_not_of(delims, pos);
        if (beg_word == std::string::npos) 
            break;
        int end_word = str.find_first_of(delims, beg_word);
        int beg_next_word = str.find_first_not_of(delims, end_word);
        *out++ = std::string(str, pos, beg_next_word-pos);
        pos = end_word;
    } while (pos != std::string::npos);
}

For the moment, I've written it more like an STL algorithm, taking an iterator for its output instead of assuming it's always pushing onto a collection. Since it depends (for the moment) in the input being a string, it doesn't use iterators for the input.

回复收藏 0 原文

~没有更多了~