I just looked at the CUSCAR spec, and I think you'll get some pretty ugly regex code to parse that. You could get away with it, if you are parsing only part of it. You'll have to test for speed, as your main bottleneck will be I/O.
I did something similar with the vendor files that came from QWEST. These beasties were hierarchical text files. Parsing those sucked! I'm currently creating and parsing text files between 4 to 50 million lines each (every day).
There is a nice framework called FileHelpers Library. This framework will help you create object-oriented representation of the records (text lines). It even has a nice wizard to walk you through the creation of these objects representing the records. It will handle master-detail, delimited, and fixed formats easily.
发布评论
评论(1)
我刚刚查看了 CUSCAR 规范,我认为您会得到一些非常丑陋的正则表达式代码来解析它。 如果您只解析其中的一部分,您可能会侥幸逃脱。 您必须测试速度,因为主要瓶颈是 I/O。
我对来自 QWEST 的供应商文件做了类似的事情。 这些小东西是分层的文本文件。 解析那些烂透了! 我目前(每天)创建和解析每个 4 到 5000 万行的文本文件。
有一个很好的框架,名为 FileHelpers Library。 该框架将帮助您创建记录的面向对象表示(文本行)。 它甚至有一个很好的向导来引导您完成这些代表记录的对象的创建。 它将轻松处理主从、分隔和固定格式。
I just looked at the CUSCAR spec, and I think you'll get some pretty ugly regex code to parse that. You could get away with it, if you are parsing only part of it. You'll have to test for speed, as your main bottleneck will be I/O.
I did something similar with the vendor files that came from QWEST. These beasties were hierarchical text files. Parsing those sucked! I'm currently creating and parsing text files between 4 to 50 million lines each (every day).
There is a nice framework called FileHelpers Library. This framework will help you create object-oriented representation of the records (text lines). It even has a nice wizard to walk you through the creation of these objects representing the records. It will handle master-detail, delimited, and fixed formats easily.