ANTLR3 C 目标 - 解析器返回“未命中”出根元素
我正在尝试使用 ANTLR3 C Target 来理解 AST,但遇到了一些困难。
我有一个简单的类似 SQL 的语法文件:
grammar sql;
options
{
language = C;
output=AST;
ASTLabelType=pANTLR3_BASE_TREE;
}
sql : VERB fields;
fields : FIELD (',' FIELD)*;
VERB : 'SELECT' | 'UPDATE' | 'INSERT';
FIELD : CHAR+;
fragment
CHAR : 'a'..'z';
这在 ANTLRWorks 中按预期工作。
在我的 C 代码中,我有:
const char pInput[] = "SELECT one,two,three";
pANTLR3_INPUT_STREAM pNewStrm = antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8) pInput,sizeof(pInput),NULL);
psqlLexer lex = sqlLexerNew (pNewStrm);
pANTLR3_COMMON_TOKEN_STREAM tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,
TOKENSOURCE(lex));
psqlParser ps = sqlParserNew( tstream );
sqlParser_sql_return ret = ps->sql(ps);
pANTLR3_BASE_TREE pTree = ret.tree;
cout << "Tree: " << pTree->toStringTree(pTree)->chars << endl;
ParseSubTree(0,pTree);
当您使用 ->getChildCount
和 ->children->get
递归遍历树时,这会输出一个平面树结构。
void ParseSubTree(int level,pANTLR3_BASE_TREE pTree)
{
ANTLR3_UINT32 childcount = pTree->getChildCount(pTree);
for (int i=0;i<childcount;i++)
{
pANTLR3_BASE_TREE pChild = (pANTLR3_BASE_TREE) pTree->children->get(pTree->children,i);
for (int j=0;j<level;j++)
{
std::cout << " - ";
}
std::cout <<
pChild->getText(pChild)->chars <<
std::endl;
int f=pChild->getChildCount(pChild);
if (f>0)
{
ParseSubTree(level+1,pChild);
}
}
}
程序输出: 树:选择一、二、三 选择 一 , 二 , 3
现在,如果我更改语法文件:
sql : VERB ^fields;
.. 对 ParseSubTree 的调用仅显示字段的子节点。
程序输出: 树:(选择一、二、三) 一 , 二 , 3
我的问题是:为什么在第二种情况下,Antlr 只给出子节点? (实际上错过了 SELECT 令牌) 如果有人能给我任何指导来理解 Antlr 返回的树,我将非常感激。
有用信息: AntlrWorks 1.4.2, Antlr C 目标 3.3, MSVC 10
I'm trying to use the ANTLR3 C Target to make sense of an AST, but am running into some difficulties.
I have a simple SQL-like grammar file:
grammar sql;
options
{
language = C;
output=AST;
ASTLabelType=pANTLR3_BASE_TREE;
}
sql : VERB fields;
fields : FIELD (',' FIELD)*;
VERB : 'SELECT' | 'UPDATE' | 'INSERT';
FIELD : CHAR+;
fragment
CHAR : 'a'..'z';
and this works as expected within ANTLRWorks.
In my C code I have:
const char pInput[] = "SELECT one,two,three";
pANTLR3_INPUT_STREAM pNewStrm = antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8) pInput,sizeof(pInput),NULL);
psqlLexer lex = sqlLexerNew (pNewStrm);
pANTLR3_COMMON_TOKEN_STREAM tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,
TOKENSOURCE(lex));
psqlParser ps = sqlParserNew( tstream );
sqlParser_sql_return ret = ps->sql(ps);
pANTLR3_BASE_TREE pTree = ret.tree;
cout << "Tree: " << pTree->toStringTree(pTree)->chars << endl;
ParseSubTree(0,pTree);
This outputs a flat tree structure when you use ->getChildCount
and ->children->get
to recurse through the tree.
void ParseSubTree(int level,pANTLR3_BASE_TREE pTree)
{
ANTLR3_UINT32 childcount = pTree->getChildCount(pTree);
for (int i=0;i<childcount;i++)
{
pANTLR3_BASE_TREE pChild = (pANTLR3_BASE_TREE) pTree->children->get(pTree->children,i);
for (int j=0;j<level;j++)
{
std::cout << " - ";
}
std::cout <<
pChild->getText(pChild)->chars <<
std::endl;
int f=pChild->getChildCount(pChild);
if (f>0)
{
ParseSubTree(level+1,pChild);
}
}
}
Program output:
Tree: SELECT one , two , three
SELECT
one
,
two
,
three
Now, if I alter the grammar file:
sql : VERB ^fields;
.. the call to ParseSubTree only displays the child nodes of fields.
Program output:
Tree: (SELECT one , two , three)
one
,
two
,
three
My question is: why, in the second case, is Antlr just give the child nodes? (in effect missing out the SELECT token)
I'd be very grateful if anybody can give me any pointers for making sense of the tree returned by Antlr.
Useful Information:
AntlrWorks 1.4.2,
Antlr C Target 3.3,
MSVC 10
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将
output=AST;
放在选项部分不会产生实际的 AST,它只会导致 ANTLR 创建CommonTree
令牌而不是CommonToken
(或者,在您的情况下,等效的 C 结构)。如果您使用
output=AST;
,下一步就是放置树运算符,或者在解析器规则中重写规则,以形成 AST。请参阅此之前的问答了解如何创建正确的 AST。
例如,以下语法(带有重写规则):
将以下输入:解析
为以下 AST:
Placing
output=AST;
in the options section will not produce an actual AST, it only causes ANTLR to createCommonTree
tokens instead ofCommonToken
s (or, in your case, the equivalent C structs).If you use
output=AST;
, the next step is to put tree operators, or rewrite rules inside your parser rules that give shape to your AST.See this previous Q&A to find out how to create a proper AST.
For example, the following grammar (with rewrite rules):
will parse the following input:
into the following AST:
我认为重要的是您要认识到您在 Antrlworks 中看到的树不是 AST。代码中的“.tree”是 AST,但看起来可能与您期望的不同。为了创建 AST,您需要使用重写规则在重要位置使用 ^ 符号指定节点。
您可以在此处阅读更多内容
I think it is important that you realize that the tree you see in Antrlworks is not the AST. The ".tree" in your code is the AST but may look different from what you expect. In order to create the AST, you need to specify the nodes using the ^ symbol in strategic places using rewrite rules.
You can read more here