将一串数据标记为结构向量?
因此,我有以下数据字符串,该数据字符串是通过 TCP winsock 连接接收的,并且想要进行高级标记化,将其转换为结构向量,其中每个结构代表一条记录。
std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n"
struct table_t
{
std::string key;
std::string first;
std::string last;
std::string rank;
std::additional;
};
字符串中的每条记录均由回车符分隔。我尝试拆分记录,但尚未拆分字段:
void tokenize(std::string& str, std::vector< string >records)
{
// Skip delimiters at beginning.
std::string::size_type lastPos = str.find_first_not_of("\n", 0);
// Find first "non-delimiter".
std::string::size_type pos = str.find_first_of("\n", lastPos);
while (std::string::npos != pos || std::string::npos != lastPos)
{
// Found a token, add it to the vector.
records.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of("\n", pos);
// Find next "non-delimiter"
pos = str.find_first_of("\n", lastPos);
}
}
似乎完全没有必要再次重复所有代码,以通过冒号(内部字段分隔符)进一步将每个记录标记到结构中,并将每个结构推入向量中。我确信有更好的方法可以做到这一点,或者设计本身就是错误的。
感谢您的帮助。
So I have the following string of data, which is being received through a TCP winsock connection, and would like to do an advanced tokenization, into a vector of structs, where each struct represents one record.
std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n"
struct table_t
{
std::string key;
std::string first;
std::string last;
std::string rank;
std::additional;
};
Each record in the string is delimited by a carriage return. My attempt at splitting up the records, but not yet splitting up the fields:
void tokenize(std::string& str, std::vector< string >records)
{
// Skip delimiters at beginning.
std::string::size_type lastPos = str.find_first_not_of("\n", 0);
// Find first "non-delimiter".
std::string::size_type pos = str.find_first_of("\n", lastPos);
while (std::string::npos != pos || std::string::npos != lastPos)
{
// Found a token, add it to the vector.
records.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of("\n", pos);
// Find next "non-delimiter"
pos = str.find_first_of("\n", lastPos);
}
}
It seems totally unnecessary to repeat all of that code again to further tokenize each record via the colon (internal field separator) into the struct and push each struct into a vector. I'm sure there is a better way of doing this, or perhaps the design is in itself wrong.
Thank you for any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我的解决方案:
输出:
在线演示: http://ideone.com/JwZuk
描述了我在这里使用的技术在我对不同问题的另一个解决方案中:
计算文件中单词出现频率的优雅方法
My solution:
Output:
Online Demo : http://ideone.com/JwZuk
The technique I used here is described in my another solution to different question:
Elegant ways to count the frequency of words in a file
为了将字符串分解为记录,我会使用 istringstream,如果只是
因为当我想阅读时,这会简化以后的更改
一个文件。对于标记化,最明显的解决方案是 boost::regex,所以:(
我假设了 table_t 的逻辑构造函数。另外:有一个非常
C 中的悠久传统是以 _t 结尾的名称是 typedef 的,所以你
最好找到一些其他约定。)
For breaking the string up into records, I'd use istringstream, if only
because that will simplify the changes later when I want to read from
a file. For tokenizing, the most obvious solution is boost::regex, so:
(I've assumed the logical constructor for table_t. Also: there's a very
long tradition in C that names ending in _t are typedef's, so you're
probably better off finding some other convention.)