PyParsing：并非所有令牌都传递给 setParseAction()

发布于 2024-09-03 04:51:10 字数 1277 浏览 7 评论 0原文

我正在解析诸如“CS 2110 或 INFO 3300”之类的句子。我想输出如下格式：

[[("CS" 2110)], [("INFO", 3300)]]

为此，我想我可以使用 setParseAction()。但是，statementParse() 中的 print 语句表明实际上只有最后一个令牌被传递：

>>> statement.parseString("CS 2110 or INFO 3300")
Match [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] at loc 7(1,8)
string CS 2110 or INFO 3300
loc: 7 
tokens: ['INFO', 3300]
Matched [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

我期望所有令牌都被传递，但它只是 ['信息'，3300]。我做错了什么吗？或者有其他方法可以产生所需的输出吗？

这是pyparsing代码：

from pyparsing import *

def statementParse(str, location, tokens):
    print "string %s" % str
    print "loc: %s " % location
    print "tokens: %s" % tokens

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")

OR_CONJ = Suppress("or")

COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0]))

course = DEPT_CODE + COURSE_NUMBER.setResultsName("Course")

statement = course + Optional(OR_CONJ + course).setParseAction(statementParse).setDebug()

原文

I'm parsing sentences like "CS 2110 or INFO 3300". I would like to output a format like:

[[("CS" 2110)], [("INFO", 3300)]]

To do this, I thought I could use setParseAction(). However, the print statements in statementParse() suggest that only the last tokens are actually passed:

>>> statement.parseString("CS 2110 or INFO 3300")
Match [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] at loc 7(1,8)
string CS 2110 or INFO 3300
loc: 7 
tokens: ['INFO', 3300]
Matched [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

I expected all the tokens to be passed, but it's only ['INFO', 3300]. Am I doing something wrong? Or is there another way that I can produce the desired output?

Here is the pyparsing code:

from pyparsing import *

def statementParse(str, location, tokens):
    print "string %s" % str
    print "loc: %s " % location
    print "tokens: %s" % tokens

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")

OR_CONJ = Suppress("or")

COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0]))

course = DEPT_CODE + COURSE_NUMBER.setResultsName("Course")

statement = course + Optional(OR_CONJ + course).setParseAction(statementParse).setDebug()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

[浮城] 2024-09-10 04:51:10

为了保留“CS 2110”和“INFO 3300”中的标记位，我建议您将您的定义当然包装在一个组中：

course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")

它看起来也像您正在正面解析某种搜索表达式，例如“x 和 y 或 z”。这个问题有一些微妙之处，我建议您查看 pyparsing wiki 上的一些示例，了解如何构建此类表达式。否则你最终会得到一个由 Optional("or" + this) 和 ZeroOrMore( “和”+那个） 件。作为最后的手段，您甚至可以只使用带有 operatorPrecedence 的东西，例如：（

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")        
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")
course = Group(DEPT_CODE + COURSE_NUMBER)

courseSearch = operatorPrecedence(course, 
    [
    ("not", 1, opAssoc.RIGHT),
    ("and", 2, opAssoc.LEFT),
    ("or", 2, opAssoc.LEFT),
    ])

您可能必须从 SourceForge SVN 下载最新的 1.5.3 版本才能正常工作。）

In order to keep the token bits from "CS 2110" and "INFO 3300", I suggest you wrap your definition of course in a Group:

course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")

It also looks like you are charging head-on at parsing out some kind of search expression, like "x and y or z". There is some subtlety to this problem, and I suggest you check out some of the examples at the pyparsing wiki on how to build up these kinds of expressions. Otherwise you will end up with a bird's nest of Optional("or" + this) and ZeroOrMore( "and" + that) pieces. As a last-ditch, you may even just use something with operatorPrecedence, like:

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")        
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")
course = Group(DEPT_CODE + COURSE_NUMBER)

courseSearch = operatorPrecedence(course, 
    [
    ("not", 1, opAssoc.RIGHT),
    ("and", 2, opAssoc.LEFT),
    ("or", 2, opAssoc.LEFT),
    ])

(You may have to download the latest 1.5.3 version from the SourceForge SVN for this to work.)

回复收藏 0 原文

微凉 2024-09-10 04:51:10

如果您在 course 和 Optional 上设置解析操作，效果会更好（您仅在 Optional 上设置） !)：

>>> statement = (course + Optional(OR_CONJ + course)).setParseAction(statementParse).setDebug()
>>> statement.parseString("CS 2110 or INFO 3300")

尽管

Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}') [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}]} at loc 0(1,1)
string CS 2110 or INFO 3300
loc: 0 
tokens: ['CS', 2110, 'INFO', 3300]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}') [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}]} -> ['CS', 2110, 'INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

我怀疑您真正想要的是在每个课程上设置解析操作，而不是在语句上：

>>> statement = course + Optional(OR_CONJ + course)
>>> statement.parseString("CS 2110 or INFO 3300")                               Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} at loc 0(1,1)
string CS 2110 or INFO 3300
loc: 0 
tokens: ['CS', 2110]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} -> ['CS', 2110]
Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} at loc 10(1,11)
string CS 2110 or INFO 3300
loc: 10 
tokens: ['INFO', 3300]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

Works better if you set the parse action on both course and the Optional (you were setting only on the Optional!):

>>> statement = (course + Optional(OR_CONJ + course)).setParseAction(statementParse).setDebug()
>>> statement.parseString("CS 2110 or INFO 3300")

gives

Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}') [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}]} at loc 0(1,1)
string CS 2110 or INFO 3300
loc: 0 
tokens: ['CS', 2110, 'INFO', 3300]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}') [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}]} -> ['CS', 2110, 'INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

though I suspect what you actually want is to set the parse action on each course, not on the statement:

>>> statement = course + Optional(OR_CONJ + course)
>>> statement.parseString("CS 2110 or INFO 3300")                               Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} at loc 0(1,1)
string CS 2110 or INFO 3300
loc: 0 
tokens: ['CS', 2110]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} -> ['CS', 2110]
Match {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} at loc 10(1,11)
string CS 2110 or INFO 3300
loc: 10 
tokens: ['INFO', 3300]
Matched {Re:('[A-Z]{2,}') Re:('[0-9]{4}')} -> ['INFO', 3300]
(['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]})

回复收藏 0 原文

~没有更多了~