ANTLR3 C 目标 - 解析器返回“未命中”出根元素

发布于 2024-10-26 03:40:12 字数 2105 浏览 7 评论 0原文

我正在尝试使用 ANTLR3 C Target 来理解 AST,但遇到了一些困难。

我有一个简单的类似 SQL 的语法文件:

grammar sql;
options 
{
    language = C;
    output=AST;
    ASTLabelType=pANTLR3_BASE_TREE; 
}
sql :   VERB fields;
fields  :   FIELD (',' FIELD)*;
VERB    :   'SELECT' | 'UPDATE' | 'INSERT';
FIELD   :   CHAR+;
fragment
CHAR    :   'a'..'z';

这在 ANTLRWorks 中按预期工作。

在我的 C 代码中,我有:

const char pInput[] = "SELECT one,two,three";
pANTLR3_INPUT_STREAM pNewStrm = antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8) pInput,sizeof(pInput),NULL);
psqlLexer lex =  sqlLexerNew         (pNewStrm);
pANTLR3_COMMON_TOKEN_STREAM   tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,
    TOKENSOURCE(lex));
psqlParser ps = sqlParserNew( tstream );
sqlParser_sql_return ret = ps->sql(ps);
pANTLR3_BASE_TREE pTree = ret.tree;
cout << "Tree: " << pTree->toStringTree(pTree)->chars << endl;
ParseSubTree(0,pTree);

当您使用 ->getChildCount->children->get 递归遍历树时,这会输出一个平面树结构。

void ParseSubTree(int level,pANTLR3_BASE_TREE pTree)
{
    ANTLR3_UINT32 childcount =  pTree->getChildCount(pTree);

    for (int i=0;i<childcount;i++)
    {
        pANTLR3_BASE_TREE pChild = (pANTLR3_BASE_TREE) pTree->children->get(pTree->children,i);
        for (int j=0;j<level;j++)
        {
            std::cout << " - ";
        }
        std::cout << 
            pChild->getText(pChild)->chars <<       
            std::endl;
        int f=pChild->getChildCount(pChild);
        if (f>0)
        {
            ParseSubTree(level+1,pChild);
        }
    }
}

程序输出: 树:选择一、二、三 选择 一 , 二 , 3

现在,如果我更改语法文件:

sql :   VERB ^fields;

.. 对 ParseSubTree 的调用仅显示字段的子节点。

程序输出: 树:(选择一、二、三) 一 , 二 , 3

我的问题是:为什么在第二种情况下,Antlr 只给出子节点? (实际上错过了 SELECT 令牌) 如果有人能给我任何指导来理解 Antlr 返回的树,我将非常感激。

有用信息: AntlrWorks 1.4.2, Antlr C 目标 3.3, MSVC 10

I'm trying to use the ANTLR3 C Target to make sense of an AST, but am running into some difficulties.

I have a simple SQL-like grammar file:

grammar sql;
options 
{
    language = C;
    output=AST;
    ASTLabelType=pANTLR3_BASE_TREE; 
}
sql :   VERB fields;
fields  :   FIELD (',' FIELD)*;
VERB    :   'SELECT' | 'UPDATE' | 'INSERT';
FIELD   :   CHAR+;
fragment
CHAR    :   'a'..'z';

and this works as expected within ANTLRWorks.

In my C code I have:

const char pInput[] = "SELECT one,two,three";
pANTLR3_INPUT_STREAM pNewStrm = antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8) pInput,sizeof(pInput),NULL);
psqlLexer lex =  sqlLexerNew         (pNewStrm);
pANTLR3_COMMON_TOKEN_STREAM   tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,
    TOKENSOURCE(lex));
psqlParser ps = sqlParserNew( tstream );
sqlParser_sql_return ret = ps->sql(ps);
pANTLR3_BASE_TREE pTree = ret.tree;
cout << "Tree: " << pTree->toStringTree(pTree)->chars << endl;
ParseSubTree(0,pTree);

This outputs a flat tree structure when you use ->getChildCount and ->children->get to recurse through the tree.

void ParseSubTree(int level,pANTLR3_BASE_TREE pTree)
{
    ANTLR3_UINT32 childcount =  pTree->getChildCount(pTree);

    for (int i=0;i<childcount;i++)
    {
        pANTLR3_BASE_TREE pChild = (pANTLR3_BASE_TREE) pTree->children->get(pTree->children,i);
        for (int j=0;j<level;j++)
        {
            std::cout << " - ";
        }
        std::cout << 
            pChild->getText(pChild)->chars <<       
            std::endl;
        int f=pChild->getChildCount(pChild);
        if (f>0)
        {
            ParseSubTree(level+1,pChild);
        }
    }
}

Program output:
Tree: SELECT one , two , three
SELECT
one
,
two
,
three

Now, if I alter the grammar file:

sql :   VERB ^fields;

.. the call to ParseSubTree only displays the child nodes of fields.

Program output:
Tree: (SELECT one , two , three)
one
,
two
,
three

My question is: why, in the second case, is Antlr just give the child nodes? (in effect missing out the SELECT token)
I'd be very grateful if anybody can give me any pointers for making sense of the tree returned by Antlr.

Useful Information:
AntlrWorks 1.4.2,
Antlr C Target 3.3,
MSVC 10

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

好听的两个字的网名 2024-11-02 03:40:15

output=AST; 放在选项部分不会产生实际的 AST,它只会导致 ANTLR 创建 CommonTree 令牌而不是 CommonToken(或者,在您的情况下,等效的 C 结构)。

如果您使用 output=AST;,下一步就是放置树运算符,或者在解析器规则中重写规则,以形成 AST。

请参阅此之前的问答了解如何创建正确的 AST。

例如,以下语法(带有重写规则):

options {
  output=AST;
  // ...
}

sql                        // make VERB the root
  :  VERB fields        -> ^(VERB fields) 
  ;

fields                     // omit the comma's from the AST
  :  FIELD (',' FIELD)* -> FIELD+
  ;

VERB  : 'SELECT' | 'UPDATE' | 'INSERT';
FIELD : CHAR+;
SPACE : ' ' {$channel=HIDDEN;};
fragment CHAR : 'a'..'z';

将以下输入:解析

UPDATE         field,     foo  ,  bar

为以下 AST:

在此处输入图像描述

Placing output=AST; in the options section will not produce an actual AST, it only causes ANTLR to create CommonTree tokens instead of CommonTokens (or, in your case, the equivalent C structs).

If you use output=AST;, the next step is to put tree operators, or rewrite rules inside your parser rules that give shape to your AST.

See this previous Q&A to find out how to create a proper AST.

For example, the following grammar (with rewrite rules):

options {
  output=AST;
  // ...
}

sql                        // make VERB the root
  :  VERB fields        -> ^(VERB fields) 
  ;

fields                     // omit the comma's from the AST
  :  FIELD (',' FIELD)* -> FIELD+
  ;

VERB  : 'SELECT' | 'UPDATE' | 'INSERT';
FIELD : CHAR+;
SPACE : ' ' {$channel=HIDDEN;};
fragment CHAR : 'a'..'z';

will parse the following input:

UPDATE         field,     foo  ,  bar

into the following AST:

enter image description here

赤濁 2024-11-02 03:40:15

我认为重要的是您要认识到您在 Antrlworks 中看到的树不是 AST。代码中的“.tree”是 AST,但看起来可能与您期望的不同。为了创建 AST,您需要使用重写规则在重要位置使用 ^ 符号指定节点。

您可以在此处阅读更多内容

I think it is important that you realize that the tree you see in Antrlworks is not the AST. The ".tree" in your code is the AST but may look different from what you expect. In order to create the AST, you need to specify the nodes using the ^ symbol in strategic places using rewrite rules.

You can read more here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文