解析二进制文件数据并存储在数据库中的设计模式
有人推荐一种设计模式来获取二进制数据文件,将其部分解析为对象并将结果数据存储到数据库中吗?
我认为类似的模式可用于获取 XML 或制表符分隔的文件并将其解析为它们的代表对象。
常见的数据结构包括:
(标头)(DataElement1)(DataElement1SubData1)(DataElement1SubData2)(DataElement2)(DataElement2SubData1)(DataElement2SubData2)(EOF)
我认为一个好的设计将包括一种根据文件类型或包含的一些定义的元数据更改解析定义的方法标题。 因此 工厂模式 将成为解析器部分整体设计的一部分。
Does anybody recommend a design pattern for taking a binary data file, parsing parts of it into objects and storing the resultant data into a database?
I think a similar pattern could be used for taking an XML or tab-delimited file and parse it into their representative objects.
A common data structure would include:
(Header) (DataElement1) (DataElement1SubData1) (DataElement1SubData2)(DataElement2) (DataElement2SubData1) (DataElement2SubData2) (EOF)
I think a good design would include a way to change out the parsing definition based on the file type or some defined metadata included in the header. So a Factory Pattern would be part of the overall design for the Parser part.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
策略模式也许是您想要了解的一种。 该策略是文件解析算法。
然后您需要一个单独的数据库插入策略。
The Strategy pattern is maybe one you want to look at. The strategy being the file parsing algorithm.
Then you want a separate strategy for database insertion.
使用 Lex 和 YACC。 除非你在接下来的十年里专门致力于这个主题,否则他们每次都会生成更好更快的代码。
Use Lex and YACC. Unless you devote the next ten years exclusively to this subject, they will produce better and faster code every time.
我完全同意 Orion Edwards 的观点,这通常是我处理问题的方式; 但最近我开始看到一些疯狂的模式(!)。
对于更复杂的任务,我通常使用类似 解释器 (或 策略) 使用一些 构建器(或工厂)来创建数据的每个部分。
对于流数据,整个解析器看起来像一个适配器,从流对象适应到对象流(通常只是一个队列)。
对于您的示例,可能会有一个用于完整数据结构(从 head 到 EOF)的构建器,该构建器在内部使用内部数据元素的构建器(由解释器提供)。 一旦遇到 EOF,就会发射一个对象。
然而,对于许多较小的任务来说,在某些工厂函数中的 switch 语句中创建对象可能是最简单的方法。 另外,我喜欢保持数据对象不可变,因为你永远不知道何时有人将并发强加给你:)
I fully agree with Orion Edwards, and it is usually the way I approach the problem; but lately I've been starting to see some patterns(!) to the madness.
For more complex tasks I usually use something like an interpreter (or a strategy) that uses some builder (or factory) to create each part of the data.
For streaming data, the entire parser would look something like an adapter, adapting from a stream object to an object stream (which usually is just a queue).
For your example there would probably be one builder for the complete data structure (from head to EOF) which internally uses builders for the internal data elements (fed by the interpreter). Once the EOF is encountered an object would be emitted.
However, objects created in a switch statement in some factory function is probably the simplest way for many lesser tasks. Also, I like keeping my data-objects immutable as you never know when someone shoves concurrency down your throat :)
您会发现最后的代码要么类似于现有的设计模式,要么创建了一个新的设计模式。 这样你就有资格回答这个问题了:-)
You'll find that your code at the end will either resemble an existing design pattern, or you'll have created a new one. You'll then be qualified to answer this question :-)