如何实现 C++0x 原始字符串文字？

发布于 2024-09-07 07:17:56 字数 1026 浏览 4 评论 0原文

如何定义词法分析器和解析器的工作集（例如：flex 和 bison）以支持 C++0x 风格的原始字符串文字？

您可能已经知道，C++0x 中的新字符串文字可以以非常灵活的方式表达。

R"..."; - 在此代码中，几乎可以是所有内容，也不需要转义字符。

任何类型的括号都可以用来分隔字符串的结尾：

R"(我喜欢那些渴望不可能的人。(Von Goethe, "Faust"))";

文本块可以简单地使用相同字符的相同出现次数进行定义：

R";***************************(
  ; TINY BASIC FOR INTEL 8080  
  ;       VERSION 2.0  
  ;     BY LI-CHEN WANG  
  ; MODIFIED AND TRANSLATED  
  ;    TO INTEL MNEMONICS  
  ;     BY ROGER RAUSKOLB  
  ;     10 OCTOBER, 1976  
  ;       @COPYLEFT  
  ;  ALL WRONGS RESERVED      )
  ;***************************";

可以在此处找到更多信息（维基百科）和此处（ att)。

我想在我现在正在开发的语言中使用这个奇妙的功能。

那么，如何定义合适的分词器和语法分析器来获得结果呢？

预先感谢您的回答！

原文

How to define a working set of lexer and parser (exempli gratia: flex and bison) to support the C++0x styled raw string literals?

As you may already know, new string literals in C++0x can be expressed in a very flexible way.

R"<delim>...<delim>"; - in this code the <delim> can be pretty much everything and also no escape characters are needed.

Any kind of parentheses can be used to delimit the end of string:

R"(I love those who yearn for the impossible. (Von Goethe, "Faust"))";

Blocks of text can be simply defined using equal occurrences of same characters:

R";***************************(
  ; TINY BASIC FOR INTEL 8080  
  ;       VERSION 2.0  
  ;     BY LI-CHEN WANG  
  ; MODIFIED AND TRANSLATED  
  ;    TO INTEL MNEMONICS  
  ;     BY ROGER RAUSKOLB  
  ;     10 OCTOBER, 1976  
  ;       @COPYLEFT  
  ;  ALL WRONGS RESERVED      )
  ;***************************";

More information can be found here(wikipedia) and here(att).

I would like to use this fantastic feature in a language I am developing now.

So, how can I define a proper tokenizer and syntax analyzer to achive the result?

Thanks in advance for your answers!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

感性不性感 2024-09-14 07:17:56

您可以在词法分析阶段对文字进行处理，并将其转换为元标记之类的东西。

Input:  
    int a;  
    char *b = R"....";  

Preprocessed:  
    int a;
    char *b = R*literal[0]*;

Tokenized:  
    INT symbol[0] DELIM  
    CHAR OP_ASTR symbol[1] OP_EQ symbol[2] *literal[0]* DELIM  

Symbol table contents { "a", "b", "R" }  

Literal table contents { "...." }

literal[0] 是指向原始文字的指针。

You could proprocess literals in lexical analysis stage and transform them into something like meta token.

Input:  
    int a;  
    char *b = R"....";  

Preprocessed:  
    int a;
    char *b = R*literal[0]*;

Tokenized:  
    INT symbol[0] DELIM  
    CHAR OP_ASTR symbol[1] OP_EQ symbol[2] *literal[0]* DELIM  

Symbol table contents { "a", "b", "R" }  

Literal table contents { "...." }

literal[0] is the pointer to the original literal text.

回复收藏 0 原文

~没有更多了~