PyParsing:并非所有令牌都传递给 setParseAction()
我正在解析诸如“CS 2110 或 INFO 3300”之类的句子。我想输出如下格式:
[[("CS" 2110)], [("INFO", 3300)]]
为此,我想我可以使用 setParseAction()。但是,statementParse()
中的 print
语句表明实际上只有最后一个令牌被传递:
>>> statement.parseString("CS 2110 or INFO 3300")
Match [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] at loc 7(1,8)
string CS 2110 or INFO 3300
loc: 7
tokens: ['INFO', 3300]
Matched [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})
我期望所有令牌都被传递,但它只是 ['信息',3300]
。我做错了什么吗?或者有其他方法可以产生所需的输出吗?
这是pyparsing代码:
from pyparsing import *
def statementParse(str, location, tokens):
print "string %s" % str
print "loc: %s " % location
print "tokens: %s" % tokens
DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")
OR_CONJ = Suppress("or")
COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0]))
course = DEPT_CODE + COURSE_NUMBER.setResultsName("Course")
statement = course + Optional(OR_CONJ + course).setParseAction(statementParse).setDebug()
I'm parsing sentences like "CS 2110 or INFO 3300". I would like to output a format like:
[[("CS" 2110)], [("INFO", 3300)]]
To do this, I thought I could use setParseAction()
. However, the print
statements in statementParse()
suggest that only the last tokens are actually passed:
>>> statement.parseString("CS 2110 or INFO 3300")
Match [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] at loc 7(1,8)
string CS 2110 or INFO 3300
loc: 7
tokens: ['INFO', 3300]
Matched [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})
I expected all the tokens to be passed, but it's only ['INFO', 3300]
. Am I doing something wrong? Or is there another way that I can produce the desired output?
Here is the pyparsing code:
from pyparsing import *
def statementParse(str, location, tokens):
print "string %s" % str
print "loc: %s " % location
print "tokens: %s" % tokens
DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")
OR_CONJ = Suppress("or")
COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0]))
course = DEPT_CODE + COURSE_NUMBER.setResultsName("Course")
statement = course + Optional(OR_CONJ + course).setParseAction(statementParse).setDebug()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为了保留“CS 2110”和“INFO 3300”中的标记位,我建议您将您的定义当然包装在一个组中:
它看起来也像您正在正面解析某种搜索表达式,例如“x 和 y 或 z”。这个问题有一些微妙之处,我建议您查看 pyparsing wiki 上的一些示例,了解如何构建此类表达式。否则你最终会得到一个由
Optional("or" + this)
和ZeroOrMore(
件。作为最后的手段,您甚至可以只使用带有“和”+那个)
operatorPrecedence
的东西,例如:(您可能必须从 SourceForge SVN 下载最新的 1.5.3 版本才能正常工作。)
In order to keep the token bits from "CS 2110" and "INFO 3300", I suggest you wrap your definition of course in a Group:
It also looks like you are charging head-on at parsing out some kind of search expression, like "x and y or z". There is some subtlety to this problem, and I suggest you check out some of the examples at the pyparsing wiki on how to build up these kinds of expressions. Otherwise you will end up with a bird's nest of
Optional("or" + this)
andZeroOrMore(
pieces. As a last-ditch, you may even just use something with"and" + that)
operatorPrecedence
, like:(You may have to download the latest 1.5.3 version from the SourceForge SVN for this to work.)
如果您在
course
和Optional
上设置解析操作,效果会更好(您仅在Optional
上设置) !):尽管
我怀疑您真正想要的是在每个课程上设置解析操作,而不是在语句上:
Works better if you set the parse action on both
course
and theOptional
(you were setting only on theOptional
!):gives
though I suspect what you actually want is to set the parse action on each course, not on the statement: