Excel上传到数据库表
我正在寻找最佳解决方案,以允许我们的用户上传 XLS 电子表格,以便它们可用于填充我们的数据仓库 (DW) 中的表。
我们的用户是业务对象 (BO) 的重度用户,BO 允许您导出到 XLS。 当电子表格中的数据需要加载到 DW 时,他们需要一个流程将 XLS 中的数据上传到 DW 的数据库。 结果,当我认为我们真正需要的是程序化的自动提要时,我们最终得到了许多这样的“界面”。 在我看来,使用 Excel 作为系统间提要的数据源似乎是个坏主意。
问题#1:我想看看您是否同意以及为什么或不同意。
好吧,没有逆流而上的可能,所以我现在认为 XLS 上传将保留给我们。 现在我需要找到最好的解决方案。 首先,我将解释我们现在做什么,然后解释我不喜欢它的地方:
通过网页,我们提供带有一组定义的列的空 XLS 文件(无行)。 每个文件旨在用于更新不同的目标 dest 表。 每个电子表格中都有一个“上传”按钮。 按“上传”按钮会导致电子表格中的宏将文件内容序列化为 CSV 并将数据通过 FTP 传输到服务器文件夹。 调度程序会定期启动 Informatica ETL 作业,该作业使用 CSV 文件作为输入,并将数据加载到特定于自定义 XLS 的暂存表中,然后,如果记录通过编辑,则加载到相应的目标表中。 遇到的任何错误都会记录到错误表中。 对于上传的每个 XLS 文件,数据最终会出现在特定于该文件的单独暂存和错误表中。
关于我们的流程,我不喜欢的一些事情包括:
1) XLS 中的宏代码过于暴露,例如包括密码,可能会被篡改,并且在确保用户使用最新的 XLS 模板方面存在问题。 2) 业务规则编辑被放置在 ETL 程序中,它们可能应该在哪里,但因为我们希望尽快捕获错误,即在电子表格中,编辑也被添加到宏代码中。 这会导致业务编辑的重复。 我希望这些规则集中在一处并进行集中控制。 恕我直言,我认为将任何宏代码放入 XLS 中都会带来维护问题,甚至调用存储过程(其中一些我们有)或调用 Web 服务(我们尚未尝试从 XLS 宏调用 .NET Web 服务)。 ) 3) 每个 XLS 文件上传模板都有自己的流程,具有不同的暂存和错误表集以及用于报告遇到的错误的自定义屏幕。 看来我们需要一个更通用的可重用解决方案。
除了经常从 BO 导出到 XLS 的数据外,用户还喜欢 Excel,因为它更容易编辑大量记录,并且比通过 Web 界面编辑单个记录更省力。
这是我思考的大方向:
首先,我希望用户能够轻松地通过编辑来编辑Excel,但不要在电子表格中包含嵌入的宏。 我尝试了 Farpoint 的网格与 Excel 兼容性...
http://www.fpoint .com/netproducts/spreadweb/tour/excel.aspx
...我发现允许用户打开驻留在其 PC 上的 XLS 文件并在其中打开它非常容易浏览器并能够轻松访问从服务器端 .NET Web 代码读取的数据。 Excel 不在他们的浏览器中本地运行,但 Excel 的功能是通过大量客户端脚本再现的,我预计复制自己会非常痛苦。 您甚至可以从本地电子表格剪切并粘贴到网络电子表格中。 这听起来不错,但最大的问题是成本。 我们的公司快要倒闭了,不允许我们购买任何新软件。
接下来,我想确定所有电子表格上传处理中的通用组件,并提出通用处理代码。 例如,我想象一个表定义了我们的每个电子表格以及每个电子表格的格式,包括列名称和数据类型定义,可能是根据它们的目标列而不是硬编码。 基于此表模板定义,我可以生成 XLS 模板以从此表定义下载。 我还可以执行简单的通用编辑,以确保输入的数据与表定义匹配。 并且可以使用一个通用网页来呈现数据并允许报告数据类型不匹配错误并允许用户更正它们。 我还将定义一个通用表,用于将数据存储在“暂存”表中,使用具有两列的表,可能是提交号、行号、名称和值。 不再以“定制一切”为目标。
接下来我需要决定将业务规则放在哪里。 我部门的管理人员坚信,所有数据加载都应由 Informatica ETL 批处理完成,因此规则/编辑属于“Informatica 中”。 我对 Informatica 工具的使用经验为零,我更喜欢 .NET。 因此,我不确定这些规则是如何实现的,但我怀疑它们不可重用,因为 .NET 网页可以使用它们来验证特定记录。 您会看到,在某些情况下,当用户不执行批量上传时,他们确实能够编辑特定记录,并且我希望将 ETL 批量插入过程应用的相同编辑应用于单个更新尝试通过网页获取单个记录。 是否有编写单个 Web 服务或存储过程的解决方案,可以从网页调用该服务或存储过程来更新单个记录,或者为批量上传中的每个记录调用数千次? 后者听起来效率很低。
非常欢迎您对上述任何事情的想法。
I'm looking for the best solution to allow our users to upload XLS spreadsheet so that they can be used to populate tables in our data warehouse (DW).
Our users are heavy Business Object (BO) users, and BO lets you export to XLS. When they have data in a spreadsheet that needs to be loaded to the DW, they need a process to upload the data in the XLS to the DW's db. As a result, we end up with many of these "interfaces" when I think that what we really need is a programmatic automated feed. Using Excel as a data source for inter-system feeds, in my gut, just seems like a bad idea to me.
Question #1: I'd like to see if you agree and why or why not.
OK, there is no swimming against that tide, so I now take as a given that XLS uploads are here to stay for us. Now I need to find the best solution. First, I'll explain what we do now and then what I don't like about it:
Via web pages, we provide empty XLS files (no rows) with a defined set of columns. Each file is intended to be used to update a different target dest table. In each spreadsheet is an "upload" button. Pushing the Upload button results in the macro in the spreadsheet serializing the contents of the file to CSV and FTPing the data to server folder. Periodically, a scheduler fires off an Informatica ETL job that uses the CSV file as input and loads the data into a custom XLS-specific staging table and then, if the records pass edits, into the appropriate target table. Any errors encountered are logged to an error table. For each XLS file uploaded, the data ends up in a separate staging and error table that is specific for the file.
Some of the things I don't like include about our process are:
1) The macro code in the XLS is too exposed, includes passwords for example, can be tampered with and there are issues ensuring that the users are using the latest XLS templates.
2) Business Rule edits are placed in the ETL program, where they should probably be, but because we would like to catch the errors ASAP, i.e, in the spreadsheet, edits are also added to the macro code. This results in duplication of business edits. I want these rules in one place and centrally controlled. IMHO, I think putting any macro code in the XLS introduces a maintenance issue, even calls to stored procedures (some of which we have) or calls to web services (we haven't yet tried to call .NET Web Services from XLS macros.)
3) Every XLS file upload template has its own process with distinct set of staging and error tables and a custom screen for reporting errors encountered. It seems like we need a more generalized re-usable solution.
Besides often getting data exported to XLS from BO, the users like also Excel because it is easier to edit a large number of records and less clunkier than editing individual records via a web interface.
This is the general direction that I am thinking:
First, I want the users to have the ease of editing of Excel with editing, but without including embedded macros in the spreadsheet. I experimented with Farpoint's Grid with Excel compatibility...
http://www.fpoint.com/netproducts/spreadweb/tour/excel.aspx
...and I found that it was quite easy to allow a user the ability to open up an XLS file that resides on their PC and have it open up in a browser and be able to easily access the data read from server-side .NET web code. Excel isn't running locally in their browser, but the functionality of Excel is reproduced, presumably through a lot of client sided scripting that I expect would be a real pain to duplicate myself. You can even cut and paste from a local spreadsheet into the web's spreadsheet. This sounds great, by biggest problem is cost. Our company is near death and won't allow us to purchase any new software.
Next, I want to identify the common components across all spreadsheet upload processing and come up with generic processing code. For example, I imagine a table which defines each of our spreadsheets and the format of each including the column names and data type definitions, perhaps in terms of their destination columns instead of hard coding. Based on this table template definition, I can generate XLS templates for download from this table definition. I can also perform simple generic edits to ensure that the data entered matches the table definition. And one common web page can be used to present the data and allow report data type mismatch errors and allow for the user to correct them. I would also define a common table for storing the data in a "staging" table, using a table with two columns, submission #, row num, name and value, perhaps. No more "custom everything" is the goal.
Next I need to decide where to put the business rules. My dept's mgt firmly believes that all loading of data should be done by Informatica ETL batch processes and therefore the rules/edits belong "in Informatica". I have zero experience with Informatica tools, I am more of a .NET guy. I am therefore unsure as to how these rules are implemented but I suspect that they are not reusable in the sense that they can be used by a .NET web page to validate a particular record against. You see, in some cases, when the user is not performing a bulk upload, they do have the ability to edit a specific record and I would like the same edits that were applied by the ETL bulk insert process to be applied to an individual update attempt to a single record via a web page. If the solution to write a single web service or stored procedure that can be called from either the web page doing an update of a single record or called thousands of times for each record in a bulk upload? The latter sounds inefficient.
Your thoughts on anything above would be very much welcomed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从成本角度来看,您在网络上重新创建电子表格功能所需的努力将超过 Farpoint 或其他控件的成本。 即使你每小时赚 20 美元,你认为你可以在两周内完成一个工作产品吗? 我认为,当您讨论维护问题时,如果您允许 Excel 中存在 ETL 功能,您就会有事实依据 - 您需要两倍的工作量来维护转换规则。 我认为您需要说服管理层,为了创建一个可维护、强大的解决方案,您需要一些灵活的实用程序。
远点是一个不错的选择。 还有 SpreadsheetGear,它是一个 .Net 引擎,可以解释 Excel 宏并可以在 Web 服务器上运行。 它有一个 Win32 控件,允许您创建具有非常 Excel 界面功能的 WinForms 解决方案。 上次我检查时发现该产品没有网络控件。 它在提供 Excel 处理大量数据的功能方面做得非常出色。
祝你好运。 我认为您会找到一个好的解决方案,因为您似乎很好地掌握了所有不同潜在解决方案的优缺点。
From a cost perspective, the efforts you'll need to go through to re-create spreadsheet functionality on the web will exceed the cost of Farpoint or other controls. Even if you made $20 an hour, do you think you could complete a working product in under 2 weeks? I think you have the facts on your side when you discussed maintenance issues if you allow ETL functionality to exist in Excel - you have twice the amount of work to maintain the transformation rules. I think you need to convince management that in order to create a maintainable, robust solution you need some flexible utilities.
Farpoint is a good choice. There is also SpreadsheetGear that is a .Net engine that interprets Excel macros and can run on a web server. It has a Win32 control that allows you to create a WinForms solution with very Excel interface functionality. Last time I checked there was no web control for the product. It does an excellent job of providing Excel capability for processing large amounts of data.
Good luck. I think you will find a good solution since you seem to have a good grasp of the pro's and con's of all the different potential solutions.