设计处理多种文件格式、解析、验证和持久性的文件处理

发布于 2024-08-08 08:27:43 字数 1041 浏览 9 评论 0原文

如果您必须设计一个文件处理组件/系统，它可以采用多种文件格式（包括 Excel 等专有格式），解析/验证这些信息并将其存储到数据库中。您会怎么做？

注意：95% 的情况下，1 行输入数据将等于数据库中的一条记录，但并非总是如此。

目前，我正在使用一些我设计的自定义软件来解析/验证/将客户数据存储到我们的数据库中。系统通过文件系统中的位置（来自 ftp drop）来识别文件，然后加载 XML“定义”文件。（根据输入文件的放置位置加载正确的 XML）。

XML 指定文件布局（分隔或固定宽度）和字段特定项（长度、数据类型（数字、字母、字母数字）以及用于存储字段的数据库列）等内容。

         <delimiter><![CDATA[ ]]></delimiter>
   <numberOfItems>12</numberOfItems>
   <dataItems>
    <item>
     <name>Member ID</name>
     <type>any</type>
     <minLength>0</minLength>
     <maxLength>0</maxLength>
     <validate>false</validate>
     <customValidation/>
     <dbColumn>MembershipID</dbColumn>
    </item>

由于这种设计，输入文件必须是文本（固定宽度或分隔），并且从输入文件数据字段到数据库列具有 1 对 1 的关系。

我想扩展我们的文件处理系统的功能以处理 Excel 或其他文件格式。

我至少有六种方法可以继续，但我现在陷入困境，因为我没有人可以真正听取我的想法。

再说一遍：如果您必须设计一个文件处理组件，它可以采用多种文件格式（包括 Excel 等专有格式），解析/验证这些信息并将其存储到数据库中。您会怎么做？

原文

If you had to design a file processing component/system, that could take in a wide variety of file formats (including proprietary formats such as Excel), parse/validate and store this information to a DB.. How would you do it?

NOTE : 95% of the time 1 line of input data will equal one record in the database, but not always.

Currently I'm using some custom software I designed to parse/validate/store customer data to our database. The system identifies a file by location in the file system(from an ftp drop) and then loads an XML "definition" file. (The correct XML is loaded based on where the input file was dropped off at).

The XML specifies things like file layout (Delimited or Fixed Width) and field specific items (Length, Data Type(numeric, alpha, alphanumeric), and what DB column to store the field to).

         <delimiter><![CDATA[ ]]></delimiter>
   <numberOfItems>12</numberOfItems>
   <dataItems>
    <item>
     <name>Member ID</name>
     <type>any</type>
     <minLength>0</minLength>
     <maxLength>0</maxLength>
     <validate>false</validate>
     <customValidation/>
     <dbColumn>MembershipID</dbColumn>
    </item>

Because of this design the input files must be text (fixed width or delimited) and have a 1 to 1 relation from input file data field to DB column.

I'd like to extend the capabilities of our file processing system to take in Excel, or other file formats.

There are at least a half dozen ways I can proceed but I'm stuck right now because I don't have anyone to really bounce the ideas off of.

Again : If you had to design a file processing component, that could take in a wide variety of file formats (including proprietary formats such as Excel), parse/validate and store this information to a DB.. How would you do it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别忘他 2024-08-15 08:27:43

好吧，一个简单的设计就像......

+-----------+
| reader1   |
|           |---
+-----------+   \---
                    \---   +----------------+               +-------------+
                        \--|  validation    |               |  DB         |
                       /---|                |---------------|             |
+-----------+    /-----    +----------------+               +-------------+
| reader2   |----
|           |
+-----------+

读者负责文件验证（数据是否存在？）和解析，验证部分负责任何业务逻辑，而数据库......是一个数据库。

因此，您必须设计的一部分是 Generic ReaderToValidator 数据容器。这更像是一种业务逻辑类型的容器。我怀疑无论输入格式如何，您都想要相同类型的数据，因此 GR2.V。不会太难。

您可以通过使用 Validator 方法和数据成员设计 GR2V 超类来实现多态性，然后每个读取器从 GR2V 派生子类并使用自己的 ReadParseFile 方法填充数据。与严格的程序方法相比，这会引入更多的耦合。我会对此进行程序化，因为数据是在概念设计中按程序进行处理的。

Well, a straightforward design is something like...

+-----------+
| reader1   |
|           |---
+-----------+   \---
                    \---   +----------------+               +-------------+
                        \--|  validation    |               |  DB         |
                       /---|                |---------------|             |
+-----------+    /-----    +----------------+               +-------------+
| reader2   |----
|           |
+-----------+

Readers take care of file validation(does the data exist?) and parsing, the Validation section takes care of any business logic, and the DB...is a DB.

So part of what you'd have to design is the Generic ReaderToValidator data container. That's more of a business logic kind of container. I suspect you want the same kind of data regardless of the input format, so G.R.2.V. is not going to be too hard.

You can polymorphic this by designing a GR2V superclass with the Validator method and the data members, then each reader subclasses off of GR2V and fills up the data with its own ReadParseFile method. That's going to introduce a bit more coupling though than having a strict procedural approach. I'd go procedural for this, since data is being procedurally processed in the conceptual design.

回复收藏 0 原文