处理文本文件的通用算法/模式
是否有读取多行文本文件的通用算法/模式,其中某些行依赖于前面的行?我指的是如下形式的数据:
H0 //start header
HEADER1
H9 //end header
R0 RECORD1
R0 RECORD2
H0 //start header
HEADER2
H9 //end header
R0 RECORD3
R0 RECORD4
需要将当前的“标题”信息与每个后续记录相关联。
我意识到此类任务有无数的解决方案,但是是否有经验丰富的开发人员会采用经过尝试和测试的模式?
编辑: 我的直觉是,应该使用某种状态机,具有“正在读取标题”、“正在读取记录”等状态。我走在正确的道路上吗?
编辑: 虽然示例很简单,但可以处理更高程度嵌套的东西会更好
Is there a general algorithm/pattern for reading multiline text files, where some lines are dependent on preceding ones? I'm referring to data in a form like:
H0 //start header
HEADER1
H9 //end header
R0 RECORD1
R0 RECORD2
H0 //start header
HEADER2
H9 //end header
R0 RECORD3
R0 RECORD4
Where one needs to associate the current "header" info with each following record.
I realise there are countless solutions to this sort of task, but are there tried and tested patterns that more experienced developers converge on?
EDIT:
My intuition is that one should use some sort of state machine, with states like "reading header", "reading records" etc. Am I on the right path?
EDIT:
Although the example is simple, something that can handle higher degrees of nesting would be preferable
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尽管该语言的语法非常简单,但这可以看作是一个解析问题。它确实是常规,因此 FSM,正如您正确指出的那样,将起作用。一般来说,任何已建立的解析技术都可以工作;如果使用递归下降解析,您将避免显式状态,这在常规情况下变得不是真正的递归语言。以下是伪代码:
This can be looked at as a parsing problem, although the grammar of the language is very simple. It is indeed regular, and thus FSM, as you correctly noted, will work. Generally speaking, any established parsing technique will work; you would avoid explicit state if using recursive descent parsing, which becomes not really recursive in case of a regular language. The following is pseudocode:
我同意 kkm 的观点,根据你的语法的“复杂程度”,你可以考虑使用某种解析库,例如 ply
I agree with kkm, depending on how "complex" is your grammar, you may consider using some kind of parsing lib like ply