PyParsing:setParseAction() 的使用正确吗?
我有这样的字符串:
"MSE 2110, 3030, 4102"
我想输出:
[("MSE", 2110), ("MSE", 3030), ("MSE", 4102)]
这是我的处理方式,尽管我还没有完全明白:
def makeCourseList(str, location, tokens):
print "before: %s" % tokens
for index, course_number in enumerate(tokens[1:]):
tokens[index + 1] = (tokens[0][0], course_number)
print "after: %s" % tokens
course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course")
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList)
这个输出:
>>> course.parseString("CS 2110")
([(['CS', 2110], {})], {})
>>> course_data.parseString("CS 2110, 4301, 2123, 1110")
before: [['CS', 2110], 4301, 2123, 1110]
after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)]
([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)], {})
这是正确的方法吗,还是我完全关闭了?
另外, 的输出不太正确 - 我希望 course_data
发出一个格式彼此相同的 course
符号列表。现在,第一门课程与其他课程不同。 (它有一个 {}
,而其他的则没有。)
I have strings like this:
"MSE 2110, 3030, 4102"
I would like to output:
[("MSE", 2110), ("MSE", 3030), ("MSE", 4102)]
This is my way of going about it, although I haven't quite gotten it yet:
def makeCourseList(str, location, tokens):
print "before: %s" % tokens
for index, course_number in enumerate(tokens[1:]):
tokens[index + 1] = (tokens[0][0], course_number)
print "after: %s" % tokens
course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course")
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList)
This outputs:
>>> course.parseString("CS 2110")
([(['CS', 2110], {})], {})
>>> course_data.parseString("CS 2110, 4301, 2123, 1110")
before: [['CS', 2110], 4301, 2123, 1110]
after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)]
([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)], {})
Is this the right way to do it, or am I totally off?
Also, the output of isn't quite correct - I want course_data
to emit a list of course
symbols that are in the same format as each other. Right now, the first course is different from the others. (It has a {}
, whereas the others don't.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
该解决方案在解析时会记住部门,并在找到数字时发出 (dept,coursenum) 元组。
输出:
This solution memorizes the department when parsed, and emits a (dept,coursenum) tuple when a number is found.
Output:
这是一种方法,当然还有其他方法(例如,使用两个绑定方法作为解析操作 - 因此该方法所属的实例可以保持状态 - 一个用于部门代码,另一个用于课程编号)。
parseString
调用的返回值更难屈服于你的意愿(尽管我确信足够的黑魔法可以做到这一点,并且我期待 Paul McGuire 解释如何做到这一点;-),所以为什么不去绑定方法路由如...:这会发出:
如果我正确阅读了您的规格,这似乎就是您所需要的。
It's one way to do it, though of course there are others (e.g. use as parse actions two bound method -- so the instance the method belongs to can keep state -- one for the dept code and another for the course number).
The return value of the
parseString
call is harder to bend to your will (though I'm sure sufficiently dark magic will do it and I look forward to Paul McGuire explaining how;-), so why not go the bound-method route as in...:this emits:
which seems to be what you require, if I read your specs correctly.
这将为课程代码提供一个生成器。如果需要,可以使用
list()
创建列表,或者您可以直接迭代它。This would give a generator for the course codes. A list can be made with
list()
if need be, or you can iterate over it directly.当然,每个人都喜欢 PyParsing。对于像这样的分割这样简单的东西来说,更容易理解:
Sure, everybody loves
PyParsing
. For easy stuff like this split is sooo much easier to grok: