对不包括引号内分隔符的字符串进行标记

发布于 12-12 02:51 字数 503 浏览 1 评论 0原文

首先我要说的是，我已经彻底了解了这个问题的所有其他解决方案，尽管它们非常相似，但没有一个能够完全解决我的问题。

我需要使用 boost 正则表达式提取除引号（对于带引号的）之外的所有标记。

我认为我需要使用的正则表达式是：

sregex pattern = sregex::compile("\"(?P<token>[^\"]*)\"|(?P<token>\\S+)");

但我收到错误：

命名标记已存在

针对 C# 发布的解决方案似乎适用于重复的命名标记，因为它是与另一个标记的 OR 表达式。

正则表达式按空格分割，除非用引号引起来

原文

First let me say, I have gone thoroughly through all other solutions to this problem on SO, and although they are very similar, none fully solve my problem.

I need a to extract all tokens excluding quotes (for the quoted ones) using boost regex.

The regex I think I need to use is:

sregex pattern = sregex::compile("\"(?P<token>[^\"]*)\"|(?P<token>\\S+)");

But I get an error of:

named mark already exists

The solution posted for C# seems to work with a duplicate named mark given that it is an OR expression with the other one.

Regular Expression to split on spaces unless in quotes

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寂寞陪衬2024-12-19 02:51:37

我在这里回答了一个非常类似的问题：

如何使我的拆分仅在一个实行上工作并能够跳过字符串的带引号部分？

示例代码

使用Boost Spirit
支持带引号的字符串、部分带引号的字段、用户定义的分隔符、转义引号
支持许多（不同的）输出容器一般
支持 Range 概念的模型作为输入（包括 char[]，例如）

使用相对广泛的编译器版本进行了测试增强版本。

https://gist.github.com/bcfbe2b5f071c7d153a0

回复收藏 0 原文

梦中的蝴蝶2024-12-19 02:51:37

大多数正则表达式风格不允许重复使用组名称。如果所有用途都在同一个交替范围内，则某些口味允许这样做，但显然您的不是其中之一。但是，如果您运行的是足够新的 Boost 版本，则应该能够使用分支重置组。它看起来像这样 - (?|...|...|...) - 在每个替代方案中，组编号都会重置为到达分支重置组之前的位置。它也应该适用于命名组，但这并不能保证。我无法亲自测试它，所以请尝试以下操作：

"(?|\"(?P<token>[^\"]*)\"|(?P<token>\\S+))"

...如果这不起作用，请尝试使用普通的旧编号组。

Most regex flavors don't allow group names to be reused. Some flavors permit it if all the uses are within the same alternation, but apparently yours isn't one of them. However, if you're running a recent enough version of Boost, you should be able to use a branch-reset group. It looks this - (?|...|...|...) - and within each alternative the group numbering resets to wherever it was before the branch-reset group was reached. It should work with named groups, too, but that's not guaranteed. I'm not in a position to test it myself, so try this:

"(?|\"(?P<token>[^\"]*)\"|(?P<token>\\S+))"

...and if that doesn't work, try it with plain old numbered groups.

回复收藏 0 原文

瑾兮2024-12-19 02:51:37

在查看此处的答案时，我测试了另一种方法，其中涉及使用不同的组标记名称，并在迭代它们时简单地测试哪个是空白的。虽然它可能不是最快的代码，但它是迄今为止最具可读性的解决方案，这对我的问题更重要。

这是对我有用的代码：

    #include <boost/xpressive/xpressive.hpp>
    using namespace boost::xpressive;
...
    std::vector<std::string> tokens;
    std::string input = "here is a \"test string\"";
    sregex pattern = sregex::compile("\"(?P<quoted>[^\"]*)\"|(?P<unquoted>\\S+)");
    sregex_iterator cur( input.begin(), input.end(), pattern );
    sregex_iterator end;

    while(cur != end)
    {
      smatch const &what = *cur;
      if(what["quoted"].length() > 0)
      {
        tokens.push_back(what["quoted"]);
      }
      else
      {
        tokens.push_back(what["unquoted"]);
      }
      cur++;
    }

While looking through the answers here I tested another method, which involves using different group mark names and simply testing which one is blank when iterating through them. While it is probably not the fastest code, it is the most readable solution so far, which is more important for my problem.

Here is the code that worked for me:

    #include <boost/xpressive/xpressive.hpp>
    using namespace boost::xpressive;
...
    std::vector<std::string> tokens;
    std::string input = "here is a \"test string\"";
    sregex pattern = sregex::compile("\"(?P<quoted>[^\"]*)\"|(?P<unquoted>\\S+)");
    sregex_iterator cur( input.begin(), input.end(), pattern );
    sregex_iterator end;

    while(cur != end)
    {
      smatch const &what = *cur;
      if(what["quoted"].length() > 0)
      {
        tokens.push_back(what["quoted"]);
      }
      else
      {
        tokens.push_back(what["unquoted"]);
      }
      cur++;
    }

回复收藏 0 原文

~没有更多了~