当前位置：文江博客话题详情

grammar raku

语法的替代版本无法按照我的意愿工作

发布于 2025-01-16 01:07:43 字数 2733 浏览 4 评论 0 原文

此代码按照我的意愿解析 $string：

#! /usr/bin/env raku

my $string = q:to/END/;
aaa bbb   # this has trailing spaces which I want to keep

       kjkjsdf
kjkdsf
END

grammar Markdown {
    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent> <content> }
    token indent { \h* }
    token newline { \n }
    token content { \N*? <trailing>* <.newline> } 
    token trailing { \h+ }
}

my $match = Markdown.parse($string);
$match.say;

OUTPUT

｢aaa bbb

       kjkjsdf
kjkdsf
｣
 0 => ｢aaa bbb
｣
  text => ｢aaa bbb
｣
   indent => ｢｣
   content => ｢aaa bbb
｣
    trailing => ｢   ｣
 0 => ｢
｣
  blank => ｢
｣
 0 => ｢       kjkjsdf
｣
  text => ｢       kjkjsdf
｣
   indent => ｢       ｣
   content => ｢kjkjsdf
｣
 0 => ｢kjkdsf
｣
  text => ｢kjkdsf
｣
   indent => ｢｣
   content => ｢kjkdsf
｣

现在，我遇到的唯一问题是我想要 < /code> 级别与和 > 处于同一层次结构级别。捕获。

所以我尝试了这个语法：

grammar Markdown {
    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent> <content> <trailing>* <.newline> }
    token indent { \h* }
    token newline { \n }
    token content { \N*?  } 
    token trailing { \h+ }
}

但是，它破坏了解析。所以我尝试了这个：

    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent> <content>*? <trailing>* <.newline> }
    token indent { \h* }
    token newline { \n }
    token content { \N  } 
    token trailing { \h+ }

得到：

 0 => ｢aaa bbb
｣
  text => ｢aaa bbb
｣
   indent => ｢｣
   content => ｢a｣
   content => ｢a｣
   content => ｢a｣
   content => ｢ ｣
   content => ｢b｣
   content => ｢b｣
   content => ｢b｣
   trailing => ｢   ｣
 0 => ｢
｣
  blank => ｢
｣
 0 => ｢       kjkjsdf
｣
  text => ｢       kjkjsdf
｣
   indent => ｢       ｣
   content => ｢k｣
   content => ｢j｣
   content => ｢k｣
   content => ｢j｣
   content => ｢s｣
   content => ｢d｣
   content => ｢f｣
 0 => ｢kjkdsf
｣
  text => ｢kjkdsf
｣
   indent => ｢｣
   content => ｢k｣
   content => ｢j｣
   content => ｢k｣
   content => ｢d｣
   content => ｢s｣
   content => ｢f｣

这与我想要的非常接近，但它具有将 分解为单个字母的不良效果，这并不理想。我可以通过调整 $match 对象来很容易地解决这个问题，但我想尝试提高我的语法技能。

原文

This code parses $string as I'd like:

#! /usr/bin/env raku

my $string = q:to/END/;
aaa bbb   # this has trailing spaces which I want to keep

       kjkjsdf
kjkdsf
END

grammar Markdown {
    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent> <content> }
    token indent { \h* }
    token newline { \n }
    token content { \N*? <trailing>* <.newline> } 
    token trailing { \h+ }
}

my $match = Markdown.parse($string);
$match.say;

OUTPUT

｢aaa bbb

       kjkjsdf
kjkdsf
｣
 0 => ｢aaa bbb
｣
  text => ｢aaa bbb
｣
   indent => ｢｣
   content => ｢aaa bbb
｣
    trailing => ｢   ｣
 0 => ｢
｣
  blank => ｢
｣
 0 => ｢       kjkjsdf
｣
  text => ｢       kjkjsdf
｣
   indent => ｢       ｣
   content => ｢kjkjsdf
｣
 0 => ｢kjkdsf
｣
  text => ｢kjkdsf
｣
   indent => ｢｣
   content => ｢kjkdsf
｣

Now, the only problem I'm having is that I'd like the <trailing> level to be in the same level of the hierarchy as <indent> and <content> captures.

So I tried this grammar:

grammar Markdown {
    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent> <content> <trailing>* <.newline> }
    token indent { \h* }
    token newline { \n }
    token content { \N*?  } 
    token trailing { \h+ }
}

However, it breaks the parsing. So I tried this:

    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent> <content>*? <trailing>* <.newline> }
    token indent { \h* }
    token newline { \n }
    token content { \N  } 
    token trailing { \h+ }

And got:

 0 => ｢aaa bbb
｣
  text => ｢aaa bbb
｣
   indent => ｢｣
   content => ｢a｣
   content => ｢a｣
   content => ｢a｣
   content => ｢ ｣
   content => ｢b｣
   content => ｢b｣
   content => ｢b｣
   trailing => ｢   ｣
 0 => ｢
｣
  blank => ｢
｣
 0 => ｢       kjkjsdf
｣
  text => ｢       kjkjsdf
｣
   indent => ｢       ｣
   content => ｢k｣
   content => ｢j｣
   content => ｢k｣
   content => ｢j｣
   content => ｢s｣
   content => ｢d｣
   content => ｢f｣
 0 => ｢kjkdsf
｣
  text => ｢kjkdsf
｣
   indent => ｢｣
   content => ｢k｣
   content => ｢j｣
   content => ｢k｣
   content => ｢d｣
   content => ｢s｣
   content => ｢f｣

This is pretty close to what I want but it has the undesirable effect of breaking <content> up into individual letters, which is not ideal. I could fix this pretty easily after the fact by massaging the $match object but would like to try to improve my skills with grammars.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゝ偶尔ゞ 2025-01-23 01:07:43

快速而肮脏的

my $string = q:to/END/;
aaa bbb  

       kjkjsdf
kjkdsf
END

grammar Markdown {
    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent>? lt;content>=\N*? <trailing>? <.newline> }
    token indent { \h+ }
    token newline { \n }
    token trailing { \h+ }
}

my $match = Markdown.parse($string);
$match.say;

前瞻断言

my $string = q:to/END/;
aaa bbb  

       kjkjsdf
kjkdsf
END

grammar Markdown {
    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent>? <content> <trailing>? <.newline> }
    token indent { \h+ }
    token newline { \n }
    token content { [<!before <trailing>> \N]+  }
    token trailing { \h+ $ }
}

my $match = Markdown.parse($string);
$match.say;

需要一些重构

my $string = q:to/END/;
aaa bbb  

       kjkjsdf
kjkdsf
END

grammar Markdown {
    token TOP { ( <blank> | <text> )+ %% \n }
    token blank { ^^ \h* $  }
    token text { <indent>? <content> <trailing>? }
    token indent { ^^ \h+ }
    token content { [<!before <trailing>> \N]+  }
    token trailing { \h+ $ }
}

my $match = Markdown.parse($string);
$match.say;

quick and dirty

my $string = q:to/END/;
aaa bbb  

       kjkjsdf
kjkdsf
END

grammar Markdown {
    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent>? lt;content>=\N*? <trailing>? <.newline> }
    token indent { \h+ }
    token newline { \n }
    token trailing { \h+ }
}

my $match = Markdown.parse($string);
$match.say;

lookahead assertions

my $string = q:to/END/;
aaa bbb  

       kjkjsdf
kjkdsf
END

grammar Markdown {
    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent>? <content> <trailing>? <.newline> }
    token indent { \h+ }
    token newline { \n }
    token content { [<!before <trailing>> \N]+  }
    token trailing { \h+ $ }
}

my $match = Markdown.parse($string);
$match.say;

a little refactoring

my $string = q:to/END/;
aaa bbb  

       kjkjsdf
kjkdsf
END

grammar Markdown {
    token TOP { ( <blank> | <text> )+ %% \n }
    token blank { ^^ \h* $  }
    token text { <indent>? <content> <trailing>? }
    token indent { ^^ \h+ }
    token content { [<!before <trailing>> \N]+  }
    token trailing { \h+ $ }
}

my $match = Markdown.parse($string);
$match.say;

回复收藏 0 原文

宛菡 2025-01-23 01:07:43

我能够通过否定前瞻断言来完成我想要的事情：

    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent>? <content> <trailing>? <.newline> }
    token indent { \h+ }
    token newline { \n }
    token content {  <.non_trailing>  } 
    token non_trailing { ( . <!before \w \h* \n>)+ \S* }

    token trailing { \h+ }

<.non_trailing> 抑制单个字符出现在匹配对象和 中。 )+ \S* 位将匹配后面不跟空格和新行的任何字符，并且 \S* 位获取否定前瞻留下的字符。

输出

｢aaa bbb

       kjkjsdf
kjkdsf
｣
 0 => ｢aaa bbb
｣
  text => ｢aaa bbb
｣
   content => ｢aaa bbb｣
   trailing => ｢   ｣
 0 => ｢
｣
  blank => ｢
｣
 0 => ｢       kjkjsdf
｣
  text => ｢       kjkjsdf
｣
   indent => ｢       ｣
   content => ｢kjkjsdf｣
 0 => ｢kjkdsf
｣
  text => ｢kjkdsf
｣
   content => ｢kjkdsf｣

I was able to accomplish what I want with a negative lookahead assertion:

    token TOP {  ^ ([ <blank> | <text> ])+ $ }
    token blank { [ \h* <.newline> ]  }
    token text { <indent>? <content> <trailing>? <.newline> }
    token indent { \h+ }
    token newline { \n }
    token content {  <.non_trailing>  } 
    token non_trailing { ( . <!before \w \h* \n>)+ \S* }

    token trailing { \h+ }

The <.non_trailing> suppresses the individual characters from appearing in the match object and the . <!before \w \h* \n>)+ \S* bit will match any character not followed by white space and a new line and the \S* bit gets the character left over from the negative lookahead.

OUTPUT

｢aaa bbb

       kjkjsdf
kjkdsf
｣
 0 => ｢aaa bbb
｣
  text => ｢aaa bbb
｣
   content => ｢aaa bbb｣
   trailing => ｢   ｣
 0 => ｢
｣
  blank => ｢
｣
 0 => ｢       kjkjsdf
｣
  text => ｢       kjkjsdf
｣
   indent => ｢       ｣
   content => ｢kjkjsdf｣
 0 => ｢kjkdsf
｣
  text => ｢kjkdsf
｣
   content => ｢kjkdsf｣

回复收藏 0 原文

~没有更多了~