PyParsing:setParseAction() 的使用正确吗?

发布于 2024-09-03 05:01:26 字数 1111 浏览 4 评论 0原文

我有这样的字符串:

"MSE 2110, 3030, 4102"

我想输出:

[("MSE", 2110), ("MSE", 3030), ("MSE", 4102)]

这是我的处理方式,尽管我还没有完全明白:

def makeCourseList(str, location, tokens):
    print "before: %s" % tokens

    for index, course_number in enumerate(tokens[1:]):
        tokens[index + 1] = (tokens[0][0], course_number)

    print "after: %s" % tokens

course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course")

course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList)

这个输出:

>>> course.parseString("CS 2110")
([(['CS', 2110], {})], {})
>>> course_data.parseString("CS 2110, 4301, 2123, 1110")
before: [['CS', 2110], 4301, 2123, 1110]
after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)]
([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)], {})

这是正确的方法吗,还是我完全关闭了?

另外, 的输出不太正确 - 我希望 course_data 发出一个格式彼此相同的 course 符号列表。现在,第一门课程与其他课程不同。 (它有一个 {},而其他的则没有。)

I have strings like this:

"MSE 2110, 3030, 4102"

I would like to output:

[("MSE", 2110), ("MSE", 3030), ("MSE", 4102)]

This is my way of going about it, although I haven't quite gotten it yet:

def makeCourseList(str, location, tokens):
    print "before: %s" % tokens

    for index, course_number in enumerate(tokens[1:]):
        tokens[index + 1] = (tokens[0][0], course_number)

    print "after: %s" % tokens

course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course")

course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList)

This outputs:

>>> course.parseString("CS 2110")
([(['CS', 2110], {})], {})
>>> course_data.parseString("CS 2110, 4301, 2123, 1110")
before: [['CS', 2110], 4301, 2123, 1110]
after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)]
([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)], {})

Is this the right way to do it, or am I totally off?

Also, the output of isn't quite correct - I want course_data to emit a list of course symbols that are in the same format as each other. Right now, the first course is different from the others. (It has a {}, whereas the others don't.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

七禾 2024-09-10 05:01:27

该解决方案在解析时会记住部门,并在找到数字时发出 (dept,coursenum) 元组。

from pyparsing import Suppress,Word,ZeroOrMore,alphas,nums,delimitedList

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000
'''

def memorize(t):
    memorize.dept = t[0]

def token(t):
    return (memorize.dept,int(t[0]))

course = Suppress(Word(alphas).setParseAction(memorize))
number = Word(nums).setParseAction(token)
line = course + delimitedList(number)
lines = ZeroOrMore(line)

print lines.parseString(data)

输出:

[('MSE', 2110), ('MSE', 3030), ('MSE', 4102), ('CSE', 1000), ('CSE', 2000), ('CSE', 3000)]

This solution memorizes the department when parsed, and emits a (dept,coursenum) tuple when a number is found.

from pyparsing import Suppress,Word,ZeroOrMore,alphas,nums,delimitedList

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000
'''

def memorize(t):
    memorize.dept = t[0]

def token(t):
    return (memorize.dept,int(t[0]))

course = Suppress(Word(alphas).setParseAction(memorize))
number = Word(nums).setParseAction(token)
line = course + delimitedList(number)
lines = ZeroOrMore(line)

print lines.parseString(data)

Output:

[('MSE', 2110), ('MSE', 3030), ('MSE', 4102), ('CSE', 1000), ('CSE', 2000), ('CSE', 3000)]
趁微风不噪 2024-09-10 05:01:27

这是正确的做法吗?
我完全关闭了吗?

这是一种方法,当然还有其他方法(例如,使用两个绑定方法作为解析操作 - 因此该方法所属的实例可以保持状态 - 一个用于部门代码,另一个用于课程编号)。

parseString 调用的返回值更难屈服于你的意愿(尽管我确信足够的黑魔法可以做到这一点,并且我期待 Paul McGuire 解释如何做到这一点;-),所以为什么不去绑定方法路由如...:

from pyparsing import *

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")

class MyParse(object):
  def __init__(self):
      self.result = None

  def makeCourseList(self, str, location, tokens):
      print "before: %s" % tokens

      dept = tokens[0][0]
      newtokens = [(dept, tokens[0][1])]
      newtokens.extend((dept, tok) for tok in tokens[1:])

      print "after: %s" % newtokens
      self.result = newtokens

course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")

inst = MyParse()
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)
    ).setParseAction(inst.makeCourseList)
ignore = course_data.parseString("CS 2110, 4301, 2123, 1110")
print inst.result

这会发出:

before: [['CS', '2110'], '4301', '2123', '1110']
after: [('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]
[('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]

如果我正确阅读了您的规格,这似乎就是您所需要的。

Is this the right way to do it, or am
I totally off?

It's one way to do it, though of course there are others (e.g. use as parse actions two bound method -- so the instance the method belongs to can keep state -- one for the dept code and another for the course number).

The return value of the parseString call is harder to bend to your will (though I'm sure sufficiently dark magic will do it and I look forward to Paul McGuire explaining how;-), so why not go the bound-method route as in...:

from pyparsing import *

DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")

class MyParse(object):
  def __init__(self):
      self.result = None

  def makeCourseList(self, str, location, tokens):
      print "before: %s" % tokens

      dept = tokens[0][0]
      newtokens = [(dept, tokens[0][1])]
      newtokens.extend((dept, tok) for tok in tokens[1:])

      print "after: %s" % newtokens
      self.result = newtokens

course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")

inst = MyParse()
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)
    ).setParseAction(inst.makeCourseList)
ignore = course_data.parseString("CS 2110, 4301, 2123, 1110")
print inst.result

this emits:

before: [['CS', '2110'], '4301', '2123', '1110']
after: [('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]
[('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]

which seems to be what you require, if I read your specs correctly.

鸠魁 2024-09-10 05:01:27
data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''

def get_courses(data):
    for row in data.splitlines():
        department, *numbers = row.replace(",", "").split()
        for number in numbers:
            yield department, number

这将为课程代码提供一个生成器。如果需要,可以使用 list() 创建列表,或者您可以直接迭代它。

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''

def get_courses(data):
    for row in data.splitlines():
        department, *numbers = row.replace(",", "").split()
        for number in numbers:
            yield department, number

This would give a generator for the course codes. A list can be made with list() if need be, or you can iterate over it directly.

何时共饮酒 2024-09-10 05:01:27

当然,每个人都喜欢 PyParsing。对于像这样的分割这样简单的东西来说,更容易理解:

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''

all = []
for row in data.split('\n'):
        klass,num_l = row.split(' ',1)
        all.extend((klass,int(num)) for num in num_l.split(','))

Sure, everybody loves PyParsing. For easy stuff like this split is sooo much easier to grok:

data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''

all = []
for row in data.split('\n'):
        klass,num_l = row.split(' ',1)
        all.extend((klass,int(num)) for num in num_l.split(','))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文