当前位置：文江博客话题详情

c++ ifstream 函数和字段分隔符

发布于 2024-10-02 23:59:43 字数 277 浏览 0 评论 0原文

对于这个程序，我只使用了 shell 脚本中数据文件的字段分隔符。但我试图使用标准库函数 ifstream() 从数据文件中读入。唯一的问题是我得到的数据如下所示

A:KT5:14:executive desk:

这是一个哈希表，我需要分隔数据结构和交易类型行中的值。我一直在网上浏览，没有找到太多关于字段分隔符的信息，而且我发现的内容非常令人困惑。

那么问题是，有没有办法使用 ifstream 函数设置字段分隔符，或者是否有另一个我应该使用的标准库 i/o 函数？

谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无法回应 2024-10-09 23:59:43

@Steve Townsend 已经指出了一种可能性。如果您更喜欢使用 operator>> 而不是 std::getline，您也可以这样做。 istream 始终将空格视为分隔符。每个流都有一个关联的语言环境，并且每个语言环境都包含一个 ctype 分面。 ctype 方面是 istream 用于确定哪些输入字符是空白的。

在您的情况下，您显然希望流仅将换行符和冒号视为“空白”（即分隔符），而实际的空格字符仅被视为“正常”字符，而不是分隔符。

为此，您可以创建一个如下所示的 ctype 方面：

struct field_reader: std::ctype<char> {

    field_reader(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(table_size, std::ctype_base::mask());

        rc['\n'] = std::ctype_base::space;
        rc[':'] = std::ctype_base::space;
        return &rc[0];
    }
};

要使用此方面，您必须使用此方面“注入”具有语言环境的流：

int main() {
    std::stringstream input("A:KT5:14:executive desk:");

    // have the stream use our ctype facet:
    input.imbue(std::locale(std::locale(), new field_reader()));

    // copy fields from the stream to standard output, one per line:
    std::copy(std::istream_iterator<std::string>(input), 
              std::istream_iterator<std::string>(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
    return 0;
}

但是，我是第一个承认这有一些缺点的人。首先，语言环境和方面的文档记录通常相当差，因此大多数 C++ 程序员可能会发现这相当难以理解（特别是当所有实际工作都发生在“幕后”时，可以这么说））。

另一种可能性是使用 Boost Tokenizer。老实说，这需要做一些更多工作——它需要您执行一些操作，例如读取字符串，然后将其单独分解。同时，它有详细的记录，相当广为人知，并且更符合人们对如何做类似事情的先入之见，尽管有额外的复杂性，但相当多的人可能会发现它更容易遵循。

@Steve Townsend has already pointed out one possibility. If you prefer to use operator>> instead of std::getline, you can do that as well. An istream always treats whitespace as a separator. Each stream has an associated locale, and each locale includes a ctype facet. That ctype facet is what the istream uses to determine what input characters are whitespace.

In your case, you apparently want the stream to treat only newlines and colons as "whitespace" (i.e., separators), while the actual space character is just treated as a "normal" character, not a separator.

To do that, you can create a ctype facet like this:

struct field_reader: std::ctype<char> {

    field_reader(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(table_size, std::ctype_base::mask());

        rc['\n'] = std::ctype_base::space;
        rc[':'] = std::ctype_base::space;
        return &rc[0];
    }
};

To use, this, you have to "imbue" the stream with a locale using this facet:

int main() {
    std::stringstream input("A:KT5:14:executive desk:");

    // have the stream use our ctype facet:
    input.imbue(std::locale(std::locale(), new field_reader()));

    // copy fields from the stream to standard output, one per line:
    std::copy(std::istream_iterator<std::string>(input), 
              std::istream_iterator<std::string>(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
    return 0;
}

I'm the first to admit, however, that this has some shortcomings. First of all, locales and facets are generally pretty poorly documented, so most C++ programmers are likely to find this fairly difficult to understand (especially when all the real work happens "under the covers", so to speak).

Another possibility is to use Boost Tokenizer. In all honesty, this is a little more work to use -- it'll require that you do something like reading a string, then breaking it up separately. At the same time, it's well documented, pretty widely known, and fits enough better with people's preconceptions about how to do things like that, that quite a few people will probably find it easier to follow despite the extra complexity.

回复收藏 0 原文

风吹雨成花 2024-10-09 23:59:43

getline 为您提供指定的选项分隔符。然后，您可以将流中的输入读取为由 _Delim 分隔的 string 序列：

template<class CharType, class Traits, class Allocator>
   basic_istream< CharType, Traits >& getline(
       basic_istream< CharType, Traits >& _Istr,
       basic_string< CharType, Traits, Allocator >& _Str,
       CharType _Delim
   );

如果这是统一结构化的数据，则定义一个结构体来包含它可能会很有用，并且使用操作符代码内部的上述函数实现 operator>> 从流中加载每个实例。

如果您必须处理多行（因此换行符是记录分隔符，: 是字段分隔符），请使用 basic_istream::getline 将每一行依次加载到 stringstream 中，然后将该行后处理为字段，如图所示。

getline gives you the option to specify a delimiter. You can then read the input from a stream as a sequence of string separated by _Delim:

template<class CharType, class Traits, class Allocator>
   basic_istream< CharType, Traits >& getline(
       basic_istream< CharType, Traits >& _Istr,
       basic_string< CharType, Traits, Allocator >& _Str,
       CharType _Delim
   );

If this is uniformly structured data it might be useful to define a struct to contain it and implement operator>> to load each instance from the stream, using the above function internal to the operator code.

If you have to process multiple lines (so that newline is a record separator and : a field separator), load each line in turn into a stringstream using basic_istream::getline, and then postprocess the line into fields as shown.

回复收藏 0 原文

~没有更多了~