使用PLY解析SQL语句
我知道还有其他工具可以解析 SQL 语句,但出于教育目的,我正在推出自己的工具。我现在被我的语法困住了。如果您能很快发现错误,请告诉我。
SELECT = r'SELECT'
FROM = r'FROM'
COLUMN = TABLE = r'[a-zA-Z]+'
COMMA = r','
STAR = r'\*'
END = r';'
t_ignore = ' ' #ignores spaces
statement : SELECT columns FROM TABLE END
columns : STAR
| rec_columns
rec_columns : COLUMN
| rec_columns COMMA COLUMN
当我尝试解析“SELECT a FROM b;”之类的语句时我在 FROM 令牌处遇到语法错误...非常感谢任何帮助!
(编辑)代码:
#!/usr/bin/python
import ply.lex as lex
import ply.yacc as yacc
tokens = (
'SELECT',
'FROM',
'WHERE',
'TABLE',
'COLUMN',
'STAR',
'COMMA',
'END',
)
t_SELECT = r'select|SELECT'
t_FROM = r'from|FROM'
t_WHERE = r'where|WHERE'
t_TABLE = r'[a-zA-Z]+'
t_COLUMN = r'[a-zA-Z]+'
t_STAR = r'\*'
t_COMMA = r','
t_END = r';'
t_ignore = ' \t'
def t_error(t):
print 'Illegal character "%s"' % t.value[0]
t.lexer.skip(1)
lex.lex()
NONE, SELECT, INSERT, DELETE, UPDATE = range(5)
states = ['NONE', 'SELECT', 'INSERT', 'DELETE', 'UPDATE']
current_state = NONE
def p_statement_expr(t):
'statement : expression'
print states[current_state], t[1]
def p_expr_select(t):
'expression : SELECT columns FROM TABLE END'
global current_state
current_state = SELECT
print t[3]
def p_recursive_columns(t):
'''recursive_columns : recursive_columns COMMA COLUMN'''
t[0] = ', '.join([t[1], t[3]])
def p_recursive_columns_base(t):
'''recursive_columns : COLUMN'''
t[0] = t[1]
def p_columns(t):
'''columns : STAR
| recursive_columns'''
t[0] = t[1]
def p_error(t):
print 'Syntax error at "%s"' % t.value if t else 'NULL'
global current_state
current_state = NONE
yacc.yacc()
while True:
try:
input = raw_input('sql> ')
except EOFError:
break
yacc.parse(input)
I know there are other tools out there to parse SQL statements, but I am rolling out my own for educational purposes. I am getting stuck with my grammar right now.. If you can spot an error real quick please let me know.
SELECT = r'SELECT'
FROM = r'FROM'
COLUMN = TABLE = r'[a-zA-Z]+'
COMMA = r','
STAR = r'\*'
END = r';'
t_ignore = ' ' #ignores spaces
statement : SELECT columns FROM TABLE END
columns : STAR
| rec_columns
rec_columns : COLUMN
| rec_columns COMMA COLUMN
When I try to parse a statement like 'SELECT a FROM b;' I get an syntax error at the FROM token... Any help is greatly appreciated!
(Edit) Code:
#!/usr/bin/python
import ply.lex as lex
import ply.yacc as yacc
tokens = (
'SELECT',
'FROM',
'WHERE',
'TABLE',
'COLUMN',
'STAR',
'COMMA',
'END',
)
t_SELECT = r'select|SELECT'
t_FROM = r'from|FROM'
t_WHERE = r'where|WHERE'
t_TABLE = r'[a-zA-Z]+'
t_COLUMN = r'[a-zA-Z]+'
t_STAR = r'\*'
t_COMMA = r','
t_END = r';'
t_ignore = ' \t'
def t_error(t):
print 'Illegal character "%s"' % t.value[0]
t.lexer.skip(1)
lex.lex()
NONE, SELECT, INSERT, DELETE, UPDATE = range(5)
states = ['NONE', 'SELECT', 'INSERT', 'DELETE', 'UPDATE']
current_state = NONE
def p_statement_expr(t):
'statement : expression'
print states[current_state], t[1]
def p_expr_select(t):
'expression : SELECT columns FROM TABLE END'
global current_state
current_state = SELECT
print t[3]
def p_recursive_columns(t):
'''recursive_columns : recursive_columns COMMA COLUMN'''
t[0] = ', '.join([t[1], t[3]])
def p_recursive_columns_base(t):
'''recursive_columns : COLUMN'''
t[0] = t[1]
def p_columns(t):
'''columns : STAR
| recursive_columns'''
t[0] = t[1]
def p_error(t):
print 'Syntax error at "%s"' % t.value if t else 'NULL'
global current_state
current_state = NONE
yacc.yacc()
while True:
try:
input = raw_input('sql> ')
except EOFError:
break
yacc.parse(input)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您的问题是
t_TABLE
和t_COLUMN
的正则表达式也与您的保留字(SELECT 和 FROM)匹配。换句话说,SELECT a FROM b;
标记为类似COLUMN COLUMN COLUMN COLUMN END
(或其他一些不明确的标记化),并且这与您的任何产品都不匹配,因此你会得到一个语法错误。作为快速健全性检查,更改这些正则表达式以与您输入的内容完全匹配,如下所示:
您将看到语法
SELECT a FROM b;
通过,因为正则表达式 'a' 和 ' b' 与您的保留字不匹配。而且,还有另一个问题,即 TABLE 和 COLUMN 的正则表达式也重叠,因此词法分析器也无法对这些标记进行明确的标记化。
PLY 文档中有一个关于此的微妙但相关的部分。不确定解释这一点的最佳方法,但技巧是标记化传递首先发生,因此它无法真正使用生产规则中的上下文来了解它是否遇到了 TABLE 标记或 COLUMN 标记。您需要将它们概括为某种
ID
标记,然后在解析过程中将其清除。如果我有更多的精力,我会尝试更多地研究您的代码,并在代码中提供实际的解决方案,但我认为既然您已经表示这是一个学习练习,也许您会满足于我指出正确的方向。
I think your problem is that your regular expressions for
t_TABLE
andt_COLUMN
are also matching your reserved words (SELECT and FROM). In other words,SELECT a FROM b;
tokenizes to something likeCOLUMN COLUMN COLUMN COLUMN END
(or some other ambiguous tokenization) and this doesn't match any of your productions so you get a syntax error.As a quick sanity check, change those regular expressions to match exactly what you're typing in like this:
You will see that the syntax
SELECT a FROM b;
passes because the regular expressions 'a' and 'b' don't match your reserved words.And, there's another problem that the regular expressions for TABLE and COLUMN overlap as well, so the lexer can't tokenize without ambiguity with respect to those tokens either.
There's a subtle, but relevant section in the PLY documentation regarding this. Not sure the best way to explain this, but the trick is that the tokenization pass happens first so it can't really use context from your production rules to know whether it has come across a TABLE token or a COLUMN token. You need to generalize those into some kind of
ID
token and then weed things out during the parse.If I had some more energy I'd try to work through your code some more and provide an actual solution in code, but I think since you've already expressed that this is a learning exercise that perhaps you will be content with me pointing in the right direction.