适合平面文件处理的数据结构?
本质上,我必须将平面文件放入数据库中。 平面文件每行的前两个字符指示它是哪种类型的记录。
我是否为每个记录类型创建一个类,其属性与记录中的字段匹配? 我应该只使用数组吗?
我想在将数据保存到数据库之前将其加载到某种数据结构中,以便我可以使用单元测试来验证数据是否已正确加载。
以下是我必须使用的示例(BAI2 银行对账单):
01,121000358,CLIENT,050312,0213,1,80,1,2/
02,CLIENT-STANDARD,BOFAGB22,1,050311,2359,,/
03,600812345678,GBP,fab1,111319005,,V,050314,0000/
88,fab2,113781251,,V,050315,0000,fab3,113781251,,V,050316,0000/
88,fab4,113781251,,V,050317,0000,fab5,113781251,,V,050318,0000/
88,010,0,,,015,0,,,045,0,,,100,302982205,,,400,302982205,,/
16,169,57626223,V,050311,0000,102 0101857345,/
88,LLOYDS TSB BANK PL 779300 99129797
88,TRF/REF 6008ABS12300015439
88,102 0101857345 K BANK GIRO CREDIT
88,/IVD-11 MAR
49,1778372829,90/
98,1778372839,1,91/
99,1778372839,1,92
Essentially, I have to get a flat file into a database. The flat files come in with the first two characters on each line indicating which type of record it is.
Do I create a class for each record type with properties matching the fields in the record? Should I just use arrays?
I want to load the data into some sort of data structure before saving it in the database so that I can use unit tests to verify that the data was loaded correctly.
Here's a sample of what I have to work with (BAI2 bank statements):
01,121000358,CLIENT,050312,0213,1,80,1,2/
02,CLIENT-STANDARD,BOFAGB22,1,050311,2359,,/
03,600812345678,GBP,fab1,111319005,,V,050314,0000/
88,fab2,113781251,,V,050315,0000,fab3,113781251,,V,050316,0000/
88,fab4,113781251,,V,050317,0000,fab5,113781251,,V,050318,0000/
88,010,0,,,015,0,,,045,0,,,100,302982205,,,400,302982205,,/
16,169,57626223,V,050311,0000,102 0101857345,/
88,LLOYDS TSB BANK PL 779300 99129797
88,TRF/REF 6008ABS12300015439
88,102 0101857345 K BANK GIRO CREDIT
88,/IVD-11 MAR
49,1778372829,90/
98,1778372839,1,91/
99,1778372839,1,92
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我建议创建类(或结构,或您的语言支持的任何值类型),因为它
比 and 更具描述性
,如果您使用(太棒了!) FileHelpers Library,那么您的条款几乎已经为您指定了。
I'd recommend creating classes (or structs, or what-ever value type your language supports), as
is so much more descriptive than
and, if you're using the (wonderful!) FileHelpers Library, then your terms are pretty much dictated for you.
验证逻辑通常至少有两个级别,较粗的级别是“格式正确”,较细的级别是“正确的数据”。
这里有几个单独的问题。 一个问题是简单地验证数据或编写测试以确保解析准确。 执行此操作的一个简单方法是解析为接受给定范围值的类,如果不接受,则抛出适当的错误,
例如
public void setField1(int i)
{
如果 (i>100) 抛出新的 InvalidDataException...
如果
不同代码的解析逻辑明显不同,那么您可能需要为每种记录类型创建不同的类,因此您没有像
yechh 这样的条件逻辑。
Validation logic usually has at least 2 levels, the grosser level being "well-formatted" and the finer level being "correct data".
There are a few separate problems here. One issue is that of simply verifying the data, or writing tests to make sure that your parsing is accurate. A simple way to do this is to parse into a class that accepts a given range of values, and throws the appropriate error if not,
e.g.
public void setField1(int i)
{
if (i>100) throw new InvalidDataException...
}
Creating different classes for each record type is something you might want to do if the parsing logic is significantly different for different codes, so you don't have conditional logic like
yechh.
当我过去必须加载此类数据时,我将其全部放入工作表中,前两个字符在一个字段中,其余字符在另一个字段中。 然后我根据前两个字符将其解析到适当的其他工作表。 然后,在将第二组工作表中的数据插入数据库之前,我已经完成了所有清理和验证。
在 SQL Server 中,您可以通过 DTS (2000) 或 SSIS 包来完成此操作,并使用 SSIS ,您可以通过首先存储在工作表中来即时处理数据,但过程类似,使用前两个字符确定要使用的数据流分支,然后将其余记录解析为某种类型的保存机制,然后在插入之前进行清理和验证。 我确信其他数据库也有某种类型的导入数据的机制,并且会使用类似的过程。
When I have had to load this kind of data in the past, I have put it all into a work table with the first two characters in one field and the rest in another. Then I have parsed it out to the appropriate other work tables based on the first two characters. Then I have done any cleanup and validation before inserting the data from the second set of work tables into the database.
In SQL Server you can do this through a DTS (2000) or an SSIS package and using SSIS , you may be able to process the data onthe fly with storing in work tables first, but the prcess is smilar, use the first two characters to determine the data flow branch to use, then parse the rest of the record into some type of holding mechanism and then clean up and validate before inserting. I'm sure other databases also have some type of mechanism for importing data and would use a simliar process.
我同意,如果您的数据格式具有任何复杂性,您应该创建一组自定义类来解析和保存数据、执行验证并执行任何其他适当的模型任务(例如,返回人类可读的描述,尽管有些会认为这最好放入一个单独的视图类中)。 这可能是使用继承的好情况,其中您有一个父类(可能是抽象的)定义所有类型记录共有的属性和方法,并且每个子类可以重写这些方法以在必要时提供自己的解析和验证,或添加自己的属性和方法。
I agree that if your data format has any sort of complexity you should create a set of custom classes to parse and hold the data, perform validation, and do any other appropriate model tasks (for instance, return a human readable description, although some would argue this would be better to put into a separate view class). This would probably be a good situation to use inheritance, where you have a parent class (possibly abstract) define the properties and methods common to all types of records, and each child class can override these methods to provide their own parsing and validation if necessary, or add their own properties and methods.
为每种类型的行创建一个类将是比使用数组更好的解决方案。
不过,尽管如此,我过去也曾使用哈希表的数组列表来完成同样的事情。 arraylist 中的每个项目都是一行,散列表中的每个条目都是表示列名称和单元格值的键/值对。
Creating a class for each type of row would be a better solution than using Arrays.
That said, however, in the past I have used Arraylists of Hashtables to accomplish the same thing. Each item in the arraylist is a row, and each entry in the hashtable is a key/value pair representing column name and cell value.
为什么不从设计保存数据的数据库开始,然后您可以使用实体框架为您生成类。
Why not start by designing the database that will hold the data then you can use the entity framwork to generate the classes for you.
这里有一个古怪的想法:
如果你使用 Perl,你可以使用 DBD::CSV 从平面文件中读取数据,前提是您为分隔符和 EOL 字符提供了正确的值。 然后,您可以通过 SQL 语句从平面文件中读取行; DBI 将为您将它们制作成标准 Perl 数据结构,您可以运行任何您喜欢的验证逻辑。 一旦每一行通过了所有验证测试,您就可以使用 DBD::whatever 将其写入目标数据库。
-史蒂夫
here's a wacky idea:
if you were working in Perl, you could use DBD::CSV to read data from your flat file, provided you gave it the correct values for separator and EOL characters. you'd then read rows from the flat file by means of SQL statements; DBI will make them into standard Perl data structures for you, and you can run whatever validation logic you like. once each row passes all the validation tests, you'd be able to write it into the destination database using DBD::whatever.
-steve