在 C++ 中对字符串进行标记并包含分隔符
我正在标记以下内容,但不确定如何包含分隔符。
void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{
int startpos = 0;
int pos = str.find_first_of(delimiters, startpos);
string strTemp;
while (string::npos != pos || string::npos != startpos)
{
strTemp = str.substr(startpos, pos - startpos);
tokens.push_back(strTemp.substr(0, strTemp.length()));
startpos = str.find_first_not_of(delimiters, pos);
pos = str.find_first_of(delimiters, startpos);
}
}
I'm tokening with the following, but unsure how to include the delimiters with it.
void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{
int startpos = 0;
int pos = str.find_first_of(delimiters, startpos);
string strTemp;
while (string::npos != pos || string::npos != startpos)
{
strTemp = str.substr(startpos, pos - startpos);
tokens.push_back(strTemp.substr(0, strTemp.length()));
startpos = str.find_first_not_of(delimiters, pos);
pos = str.find_first_of(delimiters, startpos);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
C++ String Toolkit Library (StrTk) 具有以下解决方案:
它应该导致token_list 具有以下元素:
更多示例可以在此处找到
The C++ String Toolkit Library (StrTk) has the following solution:
It should result with token_list have the following elements:
More examples can be found Here
我现在有点草率,但这就是我最终的结果。我不想使用 boost,因为这是一项学校作业,并且我的老师希望我使用 find_first_of 来完成此任务。
感谢大家的帮助。
I now this a little sloppy, but this is what I ended up with. I did not want to use boost since this is a school assignment and my instructor wanted me to use find_first_of to accomplish this.
Thanks for everyone's help.
我真的无法遵循你的代码,你能发布一个工作程序吗?
无论如何,这是一个简单的分词器,无需测试边缘情况:
示例输入,分隔符=
$$
:令牌:
注意:我永远不会使用未经测试编写的分词器!请使用 boost::tokenizer!
I can't really follow your code, could you post a working program?
Anyway, this is a simple tokenizer, without testing edge cases:
Example input, delimiter =
$$
:Tokens:
Note: I would never use a tokenizer I wrote without testing! please use boost::tokenizer!
如果分隔符是字符而不是字符串,则可以使用 strtok。
if the delimiters are characters and not strings, then you can use strtok.
这取决于您是否需要前面的分隔符、后面的分隔符或两者,以及您想要对字符串开头和结尾处的字符串(前后可能没有分隔符)执行的操作。
我假设您想要每个单词及其前面和后面的分隔符,但不需要任何分隔符字符串本身(例如,如果最后一个字符串后面有分隔符)。
目前,我将其编写得更像是 STL 算法,采用迭代器作为其输出,而不是假设它始终推送到集合上。由于它(目前)依赖于输入是字符串,因此它不使用迭代器作为输入。
It depends on whether you want the preceding delimiters, the following delimiters, or both, and what you want to do with strings at the beginning and end of the string that may not have delimiters before/after them.
I'm going to assume you want each word, with its preceding and following delimiters, but NOT any strings of delimiters by themselves (e.g. if there's a delimiter following the last string).
For the moment, I've written it more like an STL algorithm, taking an iterator for its output instead of assuming it's always pushing onto a collection. Since it depends (for the moment) in the input being a string, it doesn't use iterators for the input.