pyparsing - 解析简单的行

发布于 2024-10-19 14:53:20 字数 1866 浏览 2 评论 0原文

我正在绞尽脑汁地思考如何完全解析这一行，我在“( 4801)”部分遇到问题，所有其他元素都被抓取正常。

# MAIN_PROG     ( 4801) Generated at 2010-01-25 06:55:00

这是我到目前为止所拥有的

from pyparsing import nums, Word, Optional, Suppress, OneOrMore, Group, Combine, ParseException

unparsed_log_data = "# MAIN_PROG ( 4801) Generated at 2010-01-25 06:55:00.007    Type:  Periodic"

binary_name = "# MAIN_PROG"
pid = Literal("(" + nums + ")")
report_id = Combine(Suppress(binary_name) + pid)

year = Word(nums, max=4)
month = Word(nums, max=2)
day = Word(nums, max=2)
yearly_day = Combine(year + "-" + month + "-" + day)

clock24h = Combine(Word(nums, max=2) + ":" + Word(nums, max=2) + ":" + Word(nums, max=2) + Suppress("."))
timestamp = Combine(yearly_day + White(' ') + clock24h).setResultsName("timestamp")

time_bnf = report_id + Suppress("Generated at") + timestamp

time_bnf.searchString(unparsed_log_data)

编辑： 保罗，如果你有耐心，我如何过滤

unparsed_log_data = 
"""  
# MAIN_PROG     ( 4801) Generated at 2010-01-25 06:55:00
bla bla bla   
multi line garbage  
bla bla  
Efficiency       |       38       38 100 |   3497061    3497081  99 |  
more garbage
"""

time_bnf = report_id + Suppress("Generated at") + timestamp  
partial_report_ignore = Suppress(SkipTo("Efficiency"))  

efficiency_bnf = Suppress("|") + integer.setResultsName("Efficiency") + Suppress(integer) + integer.setResultsName("EfficiencyPercent")

两者 efficiency_bnf.searchString(unparsed_log_data) 和 report_and_effic.searchString(unparsed_log_data) 按预期返回数据，但如果我尝试

report_and_effic = report_bnf +partial_report_ignore +efficient_bnfreport_and_effic.searchString

(unparsed_log_data) 返回 ([], {})

编辑2： 人们应该阅读代码，
partial_report_ignore = Suppress(SkipTo("效率", include=True))

原文

I'm scratching my head on how to completely parse this line,
I'm having trouble with the '( 4801)' part, every other elements are being grabbed OK.

# MAIN_PROG     ( 4801) Generated at 2010-01-25 06:55:00

This is what I have so far

from pyparsing import nums, Word, Optional, Suppress, OneOrMore, Group, Combine, ParseException

unparsed_log_data = "# MAIN_PROG ( 4801) Generated at 2010-01-25 06:55:00.007    Type:  Periodic"

binary_name = "# MAIN_PROG"
pid = Literal("(" + nums + ")")
report_id = Combine(Suppress(binary_name) + pid)

year = Word(nums, max=4)
month = Word(nums, max=2)
day = Word(nums, max=2)
yearly_day = Combine(year + "-" + month + "-" + day)

clock24h = Combine(Word(nums, max=2) + ":" + Word(nums, max=2) + ":" + Word(nums, max=2) + Suppress("."))
timestamp = Combine(yearly_day + White(' ') + clock24h).setResultsName("timestamp")

time_bnf = report_id + Suppress("Generated at") + timestamp

time_bnf.searchString(unparsed_log_data)

EDIT:
Paul, if you have the patience,
how would I filter

unparsed_log_data = 
"""  
# MAIN_PROG     ( 4801) Generated at 2010-01-25 06:55:00
bla bla bla   
multi line garbage  
bla bla  
Efficiency       |       38       38 100 |   3497061    3497081  99 |  
more garbage
"""

time_bnf = report_id + Suppress("Generated at") + timestamp  
partial_report_ignore = Suppress(SkipTo("Efficiency"))  

efficiency_bnf = Suppress("|") + integer.setResultsName("Efficiency") + Suppress(integer) + integer.setResultsName("EfficiencyPercent")

Both
efficiency_bnf.searchString(unparsed_log_data) and
report_and_effic.searchString(unparsed_log_data)
return data as expected,
but if I try

report_and_effic = report_bnf + partial_report_ignore + efficiency_bnf

report_and_effic.searchString(unparsed_log_data)
returns ([], {})

EDIT2:
one should read in the code,
partial_report_ignore = Suppress(SkipTo("Efficiency", include=True))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乙白 2024-10-26 14:53:20

pid = Literal("(" + nums + ")")

应该是

pid = "(" + Word(nums) + ")"

Pyparsing 允许您使用“+”将字符串添加到表达式对象，例如：

expr + "some string"

它被解释为：

expr + Literal("some string")

您编写了 Literal("(" + nums + ")")。 nums 是字符串“0123456789”，用作创建 Word 的一部分，例如 Word(nums)。所以你试图匹配的不是“左括号后跟由数字组成的单词，后跟右括号”，你试图匹配文字字符串“(0123456789)”。

pid = Literal("(" + nums + ")")

should be

pid = "(" + Word(nums) + ")"

Pyparsing allows you to add strings to expression objects using '+', like:

expr + "some string"

Which gets interpreted as:

expr + Literal("some string")

You wrote Literal("(" + nums + ")"). nums is the string "0123456789", to be used as part of creating Word's, like Word(nums). So what you were trying to match was not "left-paren followed by a word composed of nums followed by right-paren", you were trying to match the literal string "(0123456789)".

回复收藏 0 原文

~没有更多了~