我应该如何使用 pyparsing 组织我的函数?

发布于 2024-08-13 20:17:03 字数 688 浏览 5 评论 0原文

我正在使用 python 和 pyparsing 解析文件(这是 PSAT 在 Matlab 中,但这并不重要)。 这里是我到目前为止所拥有的。我认为这很混乱,希望得到一些关于如何改进它的建议。具体来说,我应该如何使用 pyparsing 组织我的语法定义?

我应该将所有语法定义放在一个函数中吗?如果是这样,这将是一项巨大的功能。如果没有的话,我该如何打破它。目前我已将其拆分为文件的各个部分。是否值得创建大量只从一个地方调用一次的函数?我觉得两者都不合适。

我应该将所有输入和输出代码与其他类函数放在一个单独的文件中吗?这将使课程的目的更加明确。

我也有兴趣知道是否有更简单的方法来解析文件、进行一些健全性检查并将数据存储在类中。我似乎花了很多时间做这件事。

(如果人们同意,我会接受它足够好使用X而不是pyparsing的答案)

I am parsing a file with python and pyparsing (it's the report file for PSAT in Matlab but that isn't important). here is what I have so far. I think it's a mess and would like some advice on how to improve it. Specifically, how should I organise my grammar definitions with pyparsing?

Should I have all my grammar definitions in one function? If so, it's going to be one huge function. If not, then how do I break it up. At the moment I have split it at the sections of the file. Is it worth making loads of functions that only ever get called once from one place. Neither really feels right to me.

Should I place all my input and output code in a separate file to the other class functions? It would make the purpose of the class much clearer.

I'm also interested to know if there is an easier way to parse a file, do some sanity checks and store the data in a class. I seem to spend a lot of my time doing this.

(I will accept answers of it's good enough or use X rather than pyparsing if people agree)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

在风中等你 2024-08-20 20:17:03

我可以选择使用单一大方法来创建解析器,也可以按照现在的方式逐步进行。

我可以看到您已经定义了一些有用的辅助实用程序,例如slit(我认为是“抑制文字”)、stringtolits 和decimaltable。这对我来说看起来不错。

我喜欢您使用结果名称,它们确实提高了解析后代码的稳健性。我建议使用 pyparsing 1.4.7 中添加的快捷方式形式,您可以在其中替换

busname.setResultsName("bus1")

busname("bus1")

这可以使您的代码变得更加整洁。

我会回顾您的解析操作,看看您在哪里使用数字索引来访问各个标记,然后返回并分配结果名称。这是一种情况,其中 GetStats 返回 (ngroup + sgroup).setParseAction(self.process_stats)。 process_stats 具有如下引用:

self.num_load = tokens[0]["loads"]
self.num_generator = tokens[0]["generators"]
self.num_transformer = tokens[0]["transformers"]
self.num_line = tokens[0]["lines"]
self.num_bus = tokens[0]["buses"]
self.power_rate = tokens[1]["rate"]

我喜欢您对值和统计数据进行分组,但请继续给它们命名,例如“network”和“soln”。然后,您可以将此解析操作代码编写为(我还转换为 - 对我来说 - 更易于阅读的对象属性表示法,而不是 dict 元素表示法):

self.num_load = tokens.network.loads
self.num_generator = tokens.network.generators
self.num_transformer = tokens.network.transformers
self.num_line = tokens.network.lines
self.num_bus = tokens.network.buses
self.power_rate = tokens.soln.rate

另外,还有一个样式问题:为什么您有时使用显式和构造函数,而不是使用“+”运算符?

busdef = And([busname.setResultsName("bus1"),
            busname.setResultsName("bus2"),
            integer.setResultsName("linenum"),
            decimaltable("pf qf pl ql".split())])

这也很容易写:

busdef = (busname("bus1") + busname("bus2") + 
            integer("linenum") + 
            decimaltable("pf qf pl ql".split()))

总的来说,我认为这对于这种复杂的文件来说是差不多的。我有一种类似的格式(不幸的是,这是专有的,所以不能共享),我在其中以类似于您的方式构建代码,但是在一个大型方法中,如下所示:

def parser():
    header = Group(...)
    inputsummary = Group(...)
    jobstats = Group(...)
    measurements = Group(...)
    return header("hdr") + inputsummary("inputs") + jobstats("stats") + measurements("meas")

Group 构造在大型解析器中特别有用像这样,为解析数据的每个部分中的结果名称建立一种命名空间。

I could go either way on using a single big method to create your parser vs. taking it in steps the way you have it now.

I can see that you have defined some useful helper utilities, such as slit ("suppress Literal", I presume), stringtolits, and decimaltable. This looks good to me.

I like that you are using results names, they really improve the robustness of your post-parsing code. I would recommend using the shortcut form that was added in pyparsing 1.4.7, in which you can replace

busname.setResultsName("bus1")

with

busname("bus1")

This can declutter your code quite a bit.

I would look back through your parse actions to see where you are using numeric indexes to access individual tokens, and go back and assign results names instead. Here is one case, where GetStats returns (ngroup + sgroup).setParseAction(self.process_stats). process_stats has references like:

self.num_load = tokens[0]["loads"]
self.num_generator = tokens[0]["generators"]
self.num_transformer = tokens[0]["transformers"]
self.num_line = tokens[0]["lines"]
self.num_bus = tokens[0]["buses"]
self.power_rate = tokens[1]["rate"]

I like that you have Group'ed the values and the stats, but go ahead and give them names, like "network" and "soln". Then you could write this parse action code as (I've also converted to the - to me - easier-to-read object-attribute notation instead of dict element notation):

self.num_load = tokens.network.loads
self.num_generator = tokens.network.generators
self.num_transformer = tokens.network.transformers
self.num_line = tokens.network.lines
self.num_bus = tokens.network.buses
self.power_rate = tokens.soln.rate

Also, a style question: why do you sometimes use the explicit And constructor, instead of using the '+' operator?

busdef = And([busname.setResultsName("bus1"),
            busname.setResultsName("bus2"),
            integer.setResultsName("linenum"),
            decimaltable("pf qf pl ql".split())])

This is just as easily written:

busdef = (busname("bus1") + busname("bus2") + 
            integer("linenum") + 
            decimaltable("pf qf pl ql".split()))

Overall, I think this is about par for a file of this complexity. I have a similar format (proprietary, unfortunately, so cannot be shared) in which I built the code in pieces similar to the way you have, but in one large method, something like this:

def parser():
    header = Group(...)
    inputsummary = Group(...)
    jobstats = Group(...)
    measurements = Group(...)
    return header("hdr") + inputsummary("inputs") + jobstats("stats") + measurements("meas")

The Group constructs are especially helpful in a large parser like this, to establish a sort of namespace for results names within each section of the parsed data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文