首先,我想澄清一下,我并不是非常精通 C#。其中,我正在使用 .Net 3.5 使用 C# 进行的一个项目让我构建一个类来读取和导出包含基于记录类型的多种固定宽度格式的文件。
目前有5种类型的记录,由文件每行的第一个字符位置指示,指示特定的行格式。我遇到的问题是这些类型彼此不同。
Record type 1 has 5 columns, signifies beginning of the file
Record type 3 has 10 columns, signifies beginning of a batch
Record type 5 has 69 columns, signifies a transaction
Record type 7 has 12 columns, signifies end of the batch, summarizes
(these 3 repeat throughout the file to contain each batch)
Record type 9 has 8 columns, signifies end of the file, summarizes
对于这些类型的固定宽度文件,是否有一个好的库?我见过一些不错的人想要将整个文件作为一个规范加载,但这行不通。
每个月末大约会读取 250 个这样的文件,合并后的文件大小平均约为 300 兆。在这个项目中,效率对我来说非常重要。
根据我对数据的了解,我构建了一个我“认为”对象应该是什么样子的类层次结构......
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Extract_Processing
{
class Extract
{
private string mFilePath;
private string mFileName;
private FileHeader mFileHeader;
private FileTrailer mFileTrailer;
private List<Batch> mBatches; // A file can have many batches
public Extract(string filePath)
{ /* Using file path some static method from another class would be called to parse in the file somehow */ }
public string ToString()
{ /* Iterates all objects down the heiarchy to return the file in string format */ }
public void ToFile()
{ /* Calls some method in the file parse static class to export the file back to storage somewhere */ }
}
class FileHeader
{ /* ... contains data types for all fields in this format, ToString etc */ }
class Batch
{
private string mBatchNumber; // Should this be pulled out of the batch header to make LINQ querying simpler for this data set?
private BatchHeader mBatchHeader;
private BatchTrailer mBatchTrailer;
private List<Transaction> mTransactions; // A batch can have multiple transactions
public string ToString()
{ /* Iterates through batches to return what the entire batch would look like in string format */ }
}
class BatchHeader
{ /* ... contains data types for all fields in this format, ToString etc */ }
class Transaction
{ /* ... contains data types for all fields in this format, ToString etc */ }
class BatchTrailer
{ /* ... contains data types for all fields in this format, ToString etc */ }
class FileTrailer
{ /* ... contains data types for all fields in this format, ToString etc */ }
}
我遗漏了许多构造函数和其他方法,但我认为这个想法应该非常可靠。我正在寻找对我正在考虑的方法的想法和批评,我对 C# 不了解,并且执行时间是最高优先级。
除了一些批评之外,最大的问题是,我应该如何引入这个文件?我引入了其他语言的许多文件,例如使用 FSO 方法的 VBA、Microsoft Access ImportSpec 来读取文件(5 次,每个规范一个......哇,效率很低!),在Visual FoxPro(这是 FAAAAAAAST,但又不得不做五次),但我正在寻找 C# 中隐藏的宝石,如果所说的事情存在的话。
感谢您阅读我的小说,如果您在理解它时遇到问题,请告诉我。我将利用周末的时间来检查这个设计,看看我是否会购买它并愿意努力以这种方式实现它。
To start I would like to clarify that I'm not extremely well versed in C#. In that, a project I'm doing working in C# using .Net 3.5 has me building a class to read from and export files that contain multiple fixed width formats based on the record type.
There are currently 5 types of records indicated by the first character position in each line of the file that indicate a specific line format. The problem I have is that the types are distinct from each other.
Record type 1 has 5 columns, signifies beginning of the file
Record type 3 has 10 columns, signifies beginning of a batch
Record type 5 has 69 columns, signifies a transaction
Record type 7 has 12 columns, signifies end of the batch, summarizes
(these 3 repeat throughout the file to contain each batch)
Record type 9 has 8 columns, signifies end of the file, summarizes
Is there a good library out there for these kinds of fixed width files? I've seen a few good ones that want to load the entire file in as one spec but that won't do.
Roughly 250 of these files are read at the end of every month and combined filesize on average is about 300 megs. Efficiency is very important to me in this project.
Based on my knowledge of the data I've build a class hierarchy of what I "think" an object should look like...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Extract_Processing
{
class Extract
{
private string mFilePath;
private string mFileName;
private FileHeader mFileHeader;
private FileTrailer mFileTrailer;
private List<Batch> mBatches; // A file can have many batches
public Extract(string filePath)
{ /* Using file path some static method from another class would be called to parse in the file somehow */ }
public string ToString()
{ /* Iterates all objects down the heiarchy to return the file in string format */ }
public void ToFile()
{ /* Calls some method in the file parse static class to export the file back to storage somewhere */ }
}
class FileHeader
{ /* ... contains data types for all fields in this format, ToString etc */ }
class Batch
{
private string mBatchNumber; // Should this be pulled out of the batch header to make LINQ querying simpler for this data set?
private BatchHeader mBatchHeader;
private BatchTrailer mBatchTrailer;
private List<Transaction> mTransactions; // A batch can have multiple transactions
public string ToString()
{ /* Iterates through batches to return what the entire batch would look like in string format */ }
}
class BatchHeader
{ /* ... contains data types for all fields in this format, ToString etc */ }
class Transaction
{ /* ... contains data types for all fields in this format, ToString etc */ }
class BatchTrailer
{ /* ... contains data types for all fields in this format, ToString etc */ }
class FileTrailer
{ /* ... contains data types for all fields in this format, ToString etc */ }
}
Ive left out many constructors and other methods but I think the idea should be pretty solid. I'm looking for ideas and critique to the methods I'm considering as again, not knowledgable about C# and the execution time is the highest priority.
Biggest question besides some critique is, how should I bring in this file? I've brought in many files in other languages such as VBA using FSO methods, Microsoft Access ImportSpec to read in the file (5 times, one for each spec... wow that was inefficient!), created a 'Cursor' object in visual foxpro (which was FAAAAAAAST but again, had to do five times) but am looking for hidden gems in C# if said things exist.
Thanks for reading my novel, let me know if your having issues understanding it. I'm taking the weekend to go over this design to see if I buy it and want to take the effort to implement it this way.
发布评论
评论(3)
文件助手很不错。它有一些缺点,因为它似乎不再处于积极开发状态,并且它使您对字段使用公共变量而不是让您使用属性。但其他方面都很好。
你用这些文件做什么?您是否将它们加载到 SQL Server 中?如果是这样,并且您正在寻找快速且简单的方法,我建议您采用如下设计:
您可能可以用不到 500 行 C# 代码完成整个任务。
FileHelpers is nice. It has a couple of drawbacks in that it doesn't seem to be under active development anymore, and it makes you use public variables for your fields instead of letting you use properties. But otherwise good.
What are you doing with these files? Are you loading them into SQL Server? If so, and you're looking for FAST and SIMPLE, I'd recommend a design like this:
You could probably accomplish the whole thing in under 500 lines of C#.
除了一些批评之外,最大的问题是,我应该如何引入这个文件?
我不知道有什么好的文件 IO 库,但阅读起来非常简单。
使用 64kB 缓冲区实例化 StreamReader 类 来限制磁盘 IO 操作(我的估计是每个月底每个文件平均有 1500 笔交易)。
现在您可以流式传输文件:
1) 在每行的开头使用
Read
来确定记录的类型。2) 使用
ReadLine
方法和String.Split
方法获取列值。3) 使用列值创建对象。
或者
您可以手动缓冲 Stream 中的数据和
IndexOf
+SubString
以获得更高的性能(如果做得正确)。此外,如果行不是列而是二进制格式的原始数据类型,则可以使用 BinaryReader 类,提供一种非常简单且高性能的方式来读取对象。
Biggest question besides some critique is, how should I bring in this file?
I do not know of any good library for file IO, but the reading is pretty straightforward.
Instantiate a StreamReader class using a 64kB buffer to limit disk IO operations (my estimations is 1500 transactions average per file per the end of the month).
Now you can stream over the file:
1) Using the
Read
at the beggining of each line to determine the type of the record.2) Using the
ReadLine
method with theString.Split
method to get column values.3) Create the object using the column values.
or
You could just buffer the data from a Stream manually and
IndexOf
+SubString
for more performance (if done right).Also if the lines weren't columns but primitive datatypes in binary format, you could use the BinaryReader class for a very easy and performant way to read the objects.
我的一个批评是你没有正确实现 ToString。
应该是:
One critique I have is that you are not correctly implementing ToString.
Should be: