将一串数据标记为结构向量?

发布于 2024-10-27 06:10:59 字数 1174 浏览 3 评论 0原文

因此,我有以下数据字符串,该数据字符串是通过 TCP winsock 连接接收的,并且想要进行高级标记化,将其转换为结构向量,其中每个结构代表一条记录。

std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n"

struct table_t
{
    std::string key;
    std::string first;
    std::string last;
    std::string rank;
    std::additional;
};

字符串中的每条记录均由回车符分隔。我尝试拆分记录,但尚未拆分字段:

    void tokenize(std::string& str, std::vector< string >records)
{
    // Skip delimiters at beginning.
    std::string::size_type lastPos = str.find_first_not_of("\n", 0);
    // Find first "non-delimiter".
    std::string::size_type pos     = str.find_first_of("\n", lastPos);
    while (std::string::npos != pos || std::string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        records.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of("\n", pos);
        // Find next "non-delimiter"
        pos = str.find_first_of("\n", lastPos);
    }
}

似乎完全没有必要再次重复所有代码,以通过冒号(内部字段分隔符)进一步将每个记录标记到结构中,并将每个结构推入向量中。我确信有更好的方法可以做到这一点,或者设计本身就是错误的。

感谢您的帮助。

So I have the following string of data, which is being received through a TCP winsock connection, and would like to do an advanced tokenization, into a vector of structs, where each struct represents one record.

std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n"

struct table_t
{
    std::string key;
    std::string first;
    std::string last;
    std::string rank;
    std::additional;
};

Each record in the string is delimited by a carriage return. My attempt at splitting up the records, but not yet splitting up the fields:

    void tokenize(std::string& str, std::vector< string >records)
{
    // Skip delimiters at beginning.
    std::string::size_type lastPos = str.find_first_not_of("\n", 0);
    // Find first "non-delimiter".
    std::string::size_type pos     = str.find_first_of("\n", lastPos);
    while (std::string::npos != pos || std::string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        records.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of("\n", pos);
        // Find next "non-delimiter"
        pos = str.find_first_of("\n", lastPos);
    }
}

It seems totally unnecessary to repeat all of that code again to further tokenize each record via the colon (internal field separator) into the struct and push each struct into a vector. I'm sure there is a better way of doing this, or perhaps the design is in itself wrong.

Thank you for any help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

旧人哭 2024-11-03 06:10:59

我的解决方案:

struct colon_separated_only: std::ctype<char> 
{
    colon_separated_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        typedef std::ctype<char> cctype;
        static const cctype::mask *const_rc= cctype::classic_table();

        static cctype::mask rc[cctype::table_size];
        std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));

        rc[':'] = std::ctype_base::space; 
        return &rc[0];
    }
};

struct table_t
{
    std::string key;
    std::string first;
    std::string last;
    std::string rank;
    std::string additional;
};

int main() {
        std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n";
        stringstream s(buf);
        s.imbue(std::locale(std::locale(), new colon_separated_only()));
        table_t t;
        std::vector<table_t> data;
        while ( s >> t.key >> t.first >> t.last >> t.rank >> t.additional )
        {
           data.push_back(t);
        }
        for(size_t i = 0 ; i < data.size() ; ++i )
        {
           cout << data[i].key <<" ";
           cout << data[i].first <<" "<<data[i].last <<" ";
           cout << data[i].rank <<" "<< data[i].additional << endl;
        }
        return 0;
}

输出:

44 william adama commander stuff
33 luara roslin president data

在线演示: http://ideone.com/JwZuk


描述了我在这里使用的技术在我对不同问题的另一个解决方案中:

计算文件中单词出现频率的优雅方法

My solution:

struct colon_separated_only: std::ctype<char> 
{
    colon_separated_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        typedef std::ctype<char> cctype;
        static const cctype::mask *const_rc= cctype::classic_table();

        static cctype::mask rc[cctype::table_size];
        std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));

        rc[':'] = std::ctype_base::space; 
        return &rc[0];
    }
};

struct table_t
{
    std::string key;
    std::string first;
    std::string last;
    std::string rank;
    std::string additional;
};

int main() {
        std::string buf = "44:william:adama:commander:stuff\n33:luara:roslin:president:data\n";
        stringstream s(buf);
        s.imbue(std::locale(std::locale(), new colon_separated_only()));
        table_t t;
        std::vector<table_t> data;
        while ( s >> t.key >> t.first >> t.last >> t.rank >> t.additional )
        {
           data.push_back(t);
        }
        for(size_t i = 0 ; i < data.size() ; ++i )
        {
           cout << data[i].key <<" ";
           cout << data[i].first <<" "<<data[i].last <<" ";
           cout << data[i].rank <<" "<< data[i].additional << endl;
        }
        return 0;
}

Output:

44 william adama commander stuff
33 luara roslin president data

Online Demo : http://ideone.com/JwZuk


The technique I used here is described in my another solution to different question:

Elegant ways to count the frequency of words in a file

爱给你人给你 2024-11-03 06:10:59

为了将字符串分解为记录,我会使用 istringstream,如果只是
因为当我想阅读时,这会简化以后的更改
一个文件。对于标记化,最明显的解决方案是 boost::regex,所以:(

std::vector<table_t> parse( std::istream& input )
{
    std::vector<table_t> retval;
    std::string line;
    while ( std::getline( input, line ) ) {
        static boost::regex const pattern(
            "\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\)" );
        boost::smatch matched;
        if ( !regex_match( line, matched, pattern ) ) {
            //  Error handling...
        } else {
            retval.push_back(
                table_t( matched[1], matched[2], matched[3],
                         matched[4], matched[5] ) );
        }
    }
    return retval;
}

我假设了 table_t 的逻辑构造函数。另外:有一个非常
C 中的悠久传统是以 _t 结尾的名称是 typedef 的,所以你
最好找到一些其他约定。)

For breaking the string up into records, I'd use istringstream, if only
because that will simplify the changes later when I want to read from
a file. For tokenizing, the most obvious solution is boost::regex, so:

std::vector<table_t> parse( std::istream& input )
{
    std::vector<table_t> retval;
    std::string line;
    while ( std::getline( input, line ) ) {
        static boost::regex const pattern(
            "\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\):\([^:]*\)" );
        boost::smatch matched;
        if ( !regex_match( line, matched, pattern ) ) {
            //  Error handling...
        } else {
            retval.push_back(
                table_t( matched[1], matched[2], matched[3],
                         matched[4], matched[5] ) );
        }
    }
    return retval;
}

(I've assumed the logical constructor for table_t. Also: there's a very
long tradition in C that names ending in _t are typedef's, so you're
probably better off finding some other convention.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文