处理文本文件的通用算法/模式

发布于 2025-01-02 19:29:46 字数 397 浏览 0 评论 0原文

是否有读取多行文本文件的通用算法/模式,其中某些行依赖于前面的行?我指的是如下形式的数据:

H0 //start header
HEADER1
H9 //end header   
R0 RECORD1
R0 RECORD2
H0 //start header
HEADER2
H9 //end header
R0 RECORD3
R0 RECORD4

需要将当前的“标题”信息与每个后续记录相关联。

我意识到此类任务有无数的解决方案,但是是否有经验丰富的开发人员会采用经过尝试和测试的模式?

编辑: 我的直觉是,应该使用某种状态机,具有“正在读取标题”、“正在读取记录”等状态。我走在正确的道路上吗?

编辑: 虽然示例很简单,但可以处理更高程度嵌套的东西会更好

Is there a general algorithm/pattern for reading multiline text files, where some lines are dependent on preceding ones? I'm referring to data in a form like:

H0 //start header
HEADER1
H9 //end header   
R0 RECORD1
R0 RECORD2
H0 //start header
HEADER2
H9 //end header
R0 RECORD3
R0 RECORD4

Where one needs to associate the current "header" info with each following record.

I realise there are countless solutions to this sort of task, but are there tried and tested patterns that more experienced developers converge on?

EDIT:
My intuition is that one should use some sort of state machine, with states like "reading header", "reading records" etc. Am I on the right path?

EDIT:
Although the example is simple, something that can handle higher degrees of nesting would be preferable

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

埋情葬爱 2025-01-09 19:29:46

尽管该语言的语法非常简单,但这可以看作是一个解析问题。它确实是常规,因此 FSM,正如您正确指出的那样,将起作用。一般来说,任何已建立的解析技术都可以工作;如果使用递归下降解析,您将避免显式状态,这在常规情况下变得不是真正的递归语言。以下是伪代码:

function accept_file:
   while not_eof
      read_line;
      case prefix of
         "H0": accept_header;
         "R0": accept_record;
         otherwise: syntax_error;

function accept_record:
   make_record from substring of current_line from position 3;

function accept_header:
   read_line;
   while current_line does not start with "H9"
      add line to accumulated_lines
   create header from accumulated_lines

This can be looked at as a parsing problem, although the grammar of the language is very simple. It is indeed regular, and thus FSM, as you correctly noted, will work. Generally speaking, any established parsing technique will work; you would avoid explicit state if using recursive descent parsing, which becomes not really recursive in case of a regular language. The following is pseudocode:

function accept_file:
   while not_eof
      read_line;
      case prefix of
         "H0": accept_header;
         "R0": accept_record;
         otherwise: syntax_error;

function accept_record:
   make_record from substring of current_line from position 3;

function accept_header:
   read_line;
   while current_line does not start with "H9"
      add line to accumulated_lines
   create header from accumulated_lines
纸伞微斜 2025-01-09 19:29:46

我同意 kkm 的观点,根据你的语法的“复杂程度”,你可以考虑使用某种解析库,例如 ply

I agree with kkm, depending on how "complex" is your grammar, you may consider using some kind of parsing lib like ply

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文