我如何处理这个文本文件并解析我需要的内容?
我正在尝试解析 Python doctest 模块的输出并将其存储在 HTML 文件中。
我得到与此类似的输出:
**********************************************************************
File "example.py", line 16, in __main__.factorial
Failed example:
[factorial(n) for n in range(6)]
Expected:
[0, 1, 2, 6, 24, 120]
Got:
[1, 1, 2, 6, 24, 120]
**********************************************************************
File "example.py", line 20, in __main__.factorial
Failed example:
factorial(30)
Expected:
25252859812191058636308480000000L
Got:
265252859812191058636308480000000L
**********************************************************************
1 items had failures:
2 of 8 in __main__.factorial
***Test Failed*** 2 failures.
每个失败前面都有一行星号,它将每个测试失败彼此分隔开。
我想做的是去掉失败的文件名和方法,以及预期的和实际的结果。 然后我想使用它创建一个 HTML 文档(或将其存储在文本文件中,然后进行第二轮解析)。
如何仅使用 Python 或 UNIX shell 实用程序的某种组合来完成此操作?
编辑:我制定了以下 shell 脚本,它按照我的意愿匹配每个块,但我不确定如何将每个 sed 匹配重定向到它自己的文件。
python example.py | sed -n '/.*/,/^\**$/p' > `mktemp error.XXX`
I'm trying to parse ouput from the Python doctest module and store it in an HTML file.
I've got output similar to this:
**********************************************************************
File "example.py", line 16, in __main__.factorial
Failed example:
[factorial(n) for n in range(6)]
Expected:
[0, 1, 2, 6, 24, 120]
Got:
[1, 1, 2, 6, 24, 120]
**********************************************************************
File "example.py", line 20, in __main__.factorial
Failed example:
factorial(30)
Expected:
25252859812191058636308480000000L
Got:
265252859812191058636308480000000L
**********************************************************************
1 items had failures:
2 of 8 in __main__.factorial
***Test Failed*** 2 failures.
Each failure is preceded by a line of asterisks, which delimit each test failure from each other.
What I'd like to do is strip out the filename and method that failed, as well as the expected and actual results. Then I'd like to create an HTML document using this (or store it in a text file and then do a second round of parsing).
How can I do this using just Python or some combination of UNIX shell utilities?
EDIT: I formulated the following shell script which matches each block how I'd like,but I'm unsure how to redirect each sed match to its own file.
python example.py | sed -n '/.*/,/^\**$/p' > `mktemp error.XXX`
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以编写一个 Python 程序来将其分开,但也许更好的做法是首先考虑修改 doctest 以输出您想要的报告。 来自 doctest.DocTestRunner 的文档:
You can write a Python program to pick this apart, but maybe a better thing to do would be to look into modifying doctest to output the report you want in the first place. From the docs for doctest.DocTestRunner:
这是一个快速而肮脏的脚本,它将输出解析为具有相关信息的元组:
This is a quick and dirty script that parses the output into tuples with the relevant information:
我在 pyparsing 中编写了一个快速解析器来做到这一点。
给出
I wrote a quick parser in pyparsing to do it.
gives
这可能是我写过的最不优雅的 python 脚本之一,但它应该具有执行您想要的操作的框架,而无需求助于 UNIX 实用程序和单独的脚本来创建 html。 它未经测试,但只需要稍作调整即可工作。
This is probably one of the least elegant python scripts I've ever written, but it should have the framework to do what you want without resorting to UNIX utilities and separate scripts to create the html. It's untested, but it should only need minor tweaking to work.