当前位置：文江博客话题详情

boost 正则表达式分词器和换行符

发布于 2024-11-03 01:29:31 字数 749 浏览 1 评论 0 原文

我目前正在尝试在遇到换行符时将文本文件拆分为字符串向量。以前我曾使用 boost tokenizer 对其他分隔符执行此操作，但是当我使用换行符 '\n' 时，它会在运行时引发异常：

terminate called after throwing an instance of 'boost::escaped_list_error'
  what():  unknown escape sequence
Aborted

这是代码：

std::vector<std::string> parse_lines(const std::string& input_str){
    using namespace boost;
    std::vector<std::string> parsed;
    tokenizer<escaped_list_separator<char> > tk(input_str, escaped_list_separator<char>('\n'));
    for (tokenizer<escaped_list_separator<char> >::iterator i(tk3.begin());
                i != tk.end(); ++i) 
    {
       parsed.push_back(*i);
    }
    return parsed;
}

非常感谢任何建议！

原文

I'm currently trying to split up a text file into a vector of strings whenever a newline is encountered. Previously I have used boost tokenizer to do this with other delimiter characters but when I use the newline '\n' it throws an exception at runtime:

terminate called after throwing an instance of 'boost::escaped_list_error'
  what():  unknown escape sequence
Aborted

Here's the code:

std::vector<std::string> parse_lines(const std::string& input_str){
    using namespace boost;
    std::vector<std::string> parsed;
    tokenizer<escaped_list_separator<char> > tk(input_str, escaped_list_separator<char>('\n'));
    for (tokenizer<escaped_list_separator<char> >::iterator i(tk3.begin());
                i != tk.end(); ++i) 
    {
       parsed.push_back(*i);
    }
    return parsed;
}

Any advice greatly appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柳絮泡泡 2024-11-10 01:29:31

escaped_list_separator 的构造函数需要转义字符，然后是分隔符，然后是引号字符。通过使用换行符作为转义字符，它将输入中每一行的第一个字符视为转义序列的一部分。试试这个吧。

escaped_list_separator('\\', '\n')

http://www.boost.org/doc/libs/1_46_1/libs/tokenizer/escaped_list_separator.htm

回复收藏 0 原文

老街孤人 2024-11-10 01:29:31

鉴于标准库已经直接支持您想要的分隔符，我想我会完全跳过使用正则表达式，并使用标准库中已经存在的内容：

std::vector<std::string> parse_lines(std::string const &input_string) { 
    std::istringstream buffer(input_string);
    std::vector<std::string> ret;
    std::string line;

    while (std::getline(buffer, line))
        ret.push_back(line);
    return ret;
}

一旦您通过将字符串视为从那里流式传输和读取行，关于如何从那里开始的详细信息，您有很多选择。仅举几个例子，您可能想要使用 @UncleBens 和我在响应上一个问题。

Given that the separator you want is already supported directly by the standard library, I think I'd skip using regexes for this at all, and use what's already present in the standard library:

std::vector<std::string> parse_lines(std::string const &input_string) { 
    std::istringstream buffer(input_string);
    std::vector<std::string> ret;
    std::string line;

    while (std::getline(buffer, line))
        ret.push_back(line);
    return ret;
}

Once you deal with the problem by treating the string as a stream and read lines from there, you have quite a few options about the details of how you go from there. Just for a couple of examples, you might want to use use the line proxy and/or LineInputIterator classes that @UncleBens and I posted in response to a previous question.

回复收藏 0 原文

夜无邪 2024-11-10 01:29:31

这可能效果更好。

boost::char_separator<char> sep("\n");
boost::tokenizer<boost::char_separator<char>> tokens(text, sep);

编辑：或者，您可以使用 std::find 并制作自己的分割器循环。

This might work better.

boost::char_separator<char> sep("\n");
boost::tokenizer<boost::char_separator<char>> tokens(text, sep);

Edit: Alternately you can use std::find and make your own splitter loop.

回复收藏 0 原文

~没有更多了~

关于作者

千纸鹤带着心事

暂无简介

0 文章

0 评论

964 人气

关注发私信

友情链接

文江博客

boost 正则表达式分词器和换行符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

boost 正则表达式分词器和换行符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。