设计处理多种文件格式、解析、验证和持久性的文件处理

发布于 2024-08-08 08:27:43 字数 1041 浏览 7 评论 0原文

如果您必须设计一个文件处理组件/系统,它可以采用多种文件格式(包括 Excel 等专有格式),解析/验证这些信息并将其存储到数据库中。您会怎么做?

注意:95% 的情况下,1 行输入数据将等于数据库中的一条记录,但并非总是如此。

目前,我正在使用一些我设计的自定义软件来解析/验证/将客户数据存储到我们的数据库中。系统通过文件系统中的位置(来自 ftp drop)来识别文件,然后加载 XML“定义”文件。 (根据输入文件的放置位置加载正确的 XML)。

XML 指定文件布局(分隔或固定宽度)和字段特定项(长度、数据类型(数字、字母、字母数字)以及用于存储字段的数据库列)等内容。

         <delimiter><![CDATA[ ]]></delimiter>
   <numberOfItems>12</numberOfItems>
   <dataItems>
    <item>
     <name>Member ID</name>
     <type>any</type>
     <minLength>0</minLength>
     <maxLength>0</maxLength>
     <validate>false</validate>
     <customValidation/>
     <dbColumn>MembershipID</dbColumn>
    </item>

由于这种设计,输入文件必须是文本(固定宽度或分隔),并且从输入文件数据字段到数据库列具有 1 对 1 的关系。

我想扩展我们的文件处理系统的功能以处理 Excel 或其他文件格式。

我至少有六种方法可以继续,但我现在陷入困境,因为我没有人可以真正听取我的想法。

再说一遍:如果您必须设计一个文件处理组件,它可以采用多种文件格式(包括 Excel 等专有格式),解析/验证这些信息并将其存储到数据库中。您会怎么做?

If you had to design a file processing component/system, that could take in a wide variety of file formats (including proprietary formats such as Excel), parse/validate and store this information to a DB.. How would you do it?

NOTE : 95% of the time 1 line of input data will equal one record in the database, but not always.

Currently I'm using some custom software I designed to parse/validate/store customer data to our database. The system identifies a file by location in the file system(from an ftp drop) and then loads an XML "definition" file. (The correct XML is loaded based on where the input file was dropped off at).

The XML specifies things like file layout (Delimited or Fixed Width) and field specific items (Length, Data Type(numeric, alpha, alphanumeric), and what DB column to store the field to).

         <delimiter><![CDATA[ ]]></delimiter>
   <numberOfItems>12</numberOfItems>
   <dataItems>
    <item>
     <name>Member ID</name>
     <type>any</type>
     <minLength>0</minLength>
     <maxLength>0</maxLength>
     <validate>false</validate>
     <customValidation/>
     <dbColumn>MembershipID</dbColumn>
    </item>

Because of this design the input files must be text (fixed width or delimited) and have a 1 to 1 relation from input file data field to DB column.

I'd like to extend the capabilities of our file processing system to take in Excel, or other file formats.

There are at least a half dozen ways I can proceed but I'm stuck right now because I don't have anyone to really bounce the ideas off of.

Again : If you had to design a file processing component, that could take in a wide variety of file formats (including proprietary formats such as Excel), parse/validate and store this information to a DB.. How would you do it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

别忘他 2024-08-15 08:27:43

好吧,一个简单的设计就像......

+-----------+
| reader1   |
|           |---
+-----------+   \---
                    \---   +----------------+               +-------------+
                        \--|  validation    |               |  DB         |
                       /---|                |---------------|             |
+-----------+    /-----    +----------------+               +-------------+
| reader2   |----
|           |
+-----------+

读者负责文件验证(数据是否存在?)和解析,验证部分负责任何业务逻辑,而数据库......是一个数据库。

因此,您必须设计的一部分是 Generic ReaderToValidator 数据容器。这更像是一种业务逻辑类型的容器。我怀疑无论输入格式如何,您都想要相同类型的数据,因此 GR2.V。不会太难。

您可以通过使用 Validator 方法和数据成员设计 GR2V 超类来实现多态性,然后每个读取器从 GR2V 派生子类并使用自己的 ReadParseFile 方法填充数据。与严格的程序方法相比,这会引入更多的耦合。我会对此进行程序化,因为数据是在概念设计中按程序进行处理的。

Well, a straightforward design is something like...

+-----------+
| reader1   |
|           |---
+-----------+   \---
                    \---   +----------------+               +-------------+
                        \--|  validation    |               |  DB         |
                       /---|                |---------------|             |
+-----------+    /-----    +----------------+               +-------------+
| reader2   |----
|           |
+-----------+

Readers take care of file validation(does the data exist?) and parsing, the Validation section takes care of any business logic, and the DB...is a DB.

So part of what you'd have to design is the Generic ReaderToValidator data container. That's more of a business logic kind of container. I suspect you want the same kind of data regardless of the input format, so G.R.2.V. is not going to be too hard.

You can polymorphic this by designing a GR2V superclass with the Validator method and the data members, then each reader subclasses off of GR2V and fills up the data with its own ReadParseFile method. That's going to introduce a bit more coupling though than having a strict procedural approach. I'd go procedural for this, since data is being procedurally processed in the conceptual design.

路还长,别太狂 2024-08-15 08:27:43

您可能想创建一个博客,那么如果您在 LinkedIn 之类的网站上,您可以将讨论指向您的博客,或者在 LinkedIn 上开始讨论,因为那里的一些讨论会持续一段时间。

You may want to start a blog, then perhaps if you are on something like LinkedIn you can point the discussion to your blog, or start a discussion on LinkedIn, as some of the discussions there go on for a while.

场罚期间 2024-08-15 08:27:43

SO 对于具体细节很有好处,似乎真正的讨论在这里并不那么容易完成。评论太少,无法交流想法。我倾向于去其他地方。

尽管此类讨论应该与技术无关,但我怀疑您可能会发现 Java 和 .Net 阵营的交集并不多。我会查看 服务器端,但我使用 Java,因此寻找 Java 的东西。

SO is good for specifics, it seems like true discussion is not so easily done here. Comments are too small for interchange of ideas. I would tend to go elsewhere.

Although such discussions should be technology-agnostic, I suspect that you'll probably find that the Java and .Net camps don't meet too much. I would look at The Server Side but I do Java and hence look for Java stuff.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文