通常如何将数据从电子表格导入到多个数据库列?

发布于 2024-07-11 09:04:57 字数 544 浏览 9 评论 0原文

无论出于何种原因,我有很多客户都有存储在电子表格中的现有数据。 通常每个电子表格中有数百甚至数千个项目,要求客户通过网站手动输入它们(或者天堂禁止我自己以这种方式导入它们)是不可能的。 通常,此数据并不简单地将电子表格列映射到数据库列。 那太简单了。 通常,数据在进入数据库之前需要进行操作(数据需要用逗号等分隔)或者数据需要分布在多个表中。 或两者。

我问这个问题,并不是因为我不知道有多种方法可以做到这一点,而是因为我还没有找到一种不会让人觉得需要做更多工作的方法。 到目前为止,我已经采取了以下所有方法(可能还有更多我已经忘记的方法):

  • 使用 Excel 修改数据,因此导入会更容易一些
  • 将整个电子表格导入到临时表中,然后使用 SQL 导入
  • 编写脚本并用它导入数据(我使用过 VBScript、C#,现在使用 Ruby)

到目前为止,使用脚本似乎是最灵活的方式,但它仍然感觉有点笨拙。 我必须充分执行这项任务,以至于我什至考虑为其编写一些 DSL,以加快速度。

但在这样做之前,我很好奇,还有更好的方法吗?

For whatever reason, I have a lot of clients that have existing data that's stored in spreadsheets. Often there are hundreds, if not thousands of items in each spreadsheet, and asking the client to manually enter them through a website (or heaven forbid importing them that way myself) is out of the question. Typically, this data doesn't simply map spreadsheet column to database column. That would be too easy. Often, the data needs to be manipulated before going into the database (data needs to be split by commas, etc) or the data needs to be spread out across multiple tables. Or both.

I'm asking this question, not because I don't know of a multitude of ways to do it, but because I haven't settled on a way that doesn't feel like it takes more work than it should. So far I've taken all of the following approaches (and probably more that I've forgotten):

  • Using Excel to modify the data, so it's a little bit easier to import
  • Importing the entire spreadsheet into a temporary table and then importing with SQL
  • Writing a script and importing the data with it (I've used VBScript, C# and now Ruby)

So far, using a script has been the way that seemed most flexible, but it still feels a little clunky. I have to perform this task enough that I've even contemplated writing a little DSL for it, just to speed things up.

But before I do that, I'm curious, is there a better way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

笑咖 2024-07-18 09:04:57

如果可以的话,你必须设定界限。 您应该尝试为他们提供一个模板,以便他们使用预期的数据,其中包括文件类型(Excel、csv 等)、列名称、有效值等。您应该允许用户浏览文件并上传它在您的页面/表单上。

文件上传后,您需要进行验证和导入。 您可以使用 ADO.NET、文件流、DTS/SSIS 或 Office Automation 来执行此操作(如果您使用的是 Microsoft 堆栈)。 在验证部分,您应该准确告诉用户他们做错了什么或需要更改。 这可能包括让验证页面在数据网格中包含实际数据,并在确切的行/列上提供带有错误的红色标签。 如果您使用 Office Automation,您可以向他们提供准确的手机号码,但 Office PIA 却令人头疼。

接受验证后,您可以根据需要导入信息。 我更喜欢将其放入临时表中并使用存储过程来加载它,但这只是我的想法。 有些人更喜欢使用对象模型,但如果您有大量数据,这可能会非常慢。

如果您亲自手动加载这些文件并必须进入并操作它们,我建议您找到它们之间的共性并提出一个可遵循的标准。 一旦你有了它,你就可以让用户可以自己做,或者你自己可以更快地做。

是的,这是一项繁重的工作,但从长远来看,当有一个程序在 95% 的情况下都有效时,每个人都会获胜。

如果这是一种无法自动化的情况,那么您可能只需要一个普通的临时表并使用 sql 进行导入。 您必须将数据加载到一个临时表中,执行基本操作,然后将其加载到 SQL 所需的测试临时表中。

我已经做了很多导入和 ETL 工具,确实没有简单的方法来处理它。 唯一的方法是真正提出一个合理的标准并坚持它并围绕它进行编程。

You have to set boundaries, if you can. You should try and provide a template for them to use with the expected data, which includes file type (Excel, csv, etc.), column names, valid values, etc. You should allow the used to browse for the file and upload it on your page/form.

Once the file is uploaded, you need to do validation and importation. You can use ADO.NET, file streams, DTS/SSIS, or Office Automation to do this (if you are using the Microsoft stack). In the validation portion, you should tell the user exactly what they did wrong or need to change. This might include having the validation page have the actual data in a datagrid and providing red labels with errors on the exact row/column. If you use Office Automation, you can give them the exact cell number, but the Office PIA is a pain in the neck.

Once validation is accepted, you can import the information however you like. I prefer putting it into a staging table and using a stored proc to load it, but that's just me. Some prefer to use the object model, but this can be very slow if you have a lot of data.

If you are personally loading these files manually and having to go in and manipulate them, I would suggest finding the communality among them and coming up with a standard to follow. Once you have that, you can make it so the user can do it themselves or you can do it a lot faster yourself.

Yes, this is a lot of work, but in the long wrong, when there is a program that works 95% of the time, everybody wins.

If this is going to be a situation which just can’t be automated, then you will probably just have to have a vanilla staging table and have sql to to the importation. You will have to load the data into one staging table, do the basic manipulation, and then load it into te staging table that your SQL expects.

I’ve done so many imports and ETL tools, and there really is no easy way to handle it. The only way is to really come up with a standard that is reasonable and stick to it and program around that.

埖埖迣鎅 2024-07-18 09:04:57

是的..那太糟糕了。

我会按照剧本走。 我假设您有重复的列,它们必须与另一个表中的单行匹配。 我会进行合理的匹配,如果您遇到脚本无法处理并移动数据的行...然后记录它并让某人手动执行。

yeah.. that just sucks.

I would go with the script. And I assume you have repeating columns that have to match a single row in another table. I would do reasonable matching and if you encounter a row that the script can't deal with and move the data...then log it and make someone do it manually.

魔法少女 2024-07-18 09:04:57

当然,这些小细节会害死你,但总的来说,我已经成功地将数据从 Excel 导出为 CSV,然后使用 rool 或脚本读取它,根据需要对其进行修改,然后插入它。 根据我的环境的复杂性,这可以通过脚本语言的数据库接口来完成,具体包括将 SQL INSERT 语句写入脚本文件。

有一些很好的 CSV 包可用于 PythonRubyPerl

It's the little details that'll kill you on this, of course, but in general, I've had success with exporting the data as CSV from Excel, then reading it using a rool or script, munging it as needed, and inserting it. Depending on the wonderfulness of my environment, that can be done with a data base interface to the scripting language, down to and including writing SQL INSERT statements into a script file.

There are good CSV packages available for Python, Ruby, and Perl.

风轻花落早 2024-07-18 09:04:57

DSL 是最佳选择。

为您的问题创建域模型。 您谈论单元格、列、行、数据库表、拆分字段、组合字段、从单元格到数据库列的映射,这就是您需要的概念。 此外,您可能需要范围(单元格)和工作表。

简单视图仅查看电子表格中的值,而不查看基础公式。 将电子表格导出为制表符分隔的文本可以让您访问该电子表格。 如果您需要访问公式,最好使用 xml 表示形式,即 XML 电子表格或 Office XML 格式。

您也许能够在 Excel 中提出 DSL。 这可以让更聪明的用户完成(部分)映射。

A DSL is the way to go.

Create a domain model for your problem. You talk about cells, columns, rows, database tables, splitting fields, combining fields, mapping from cells to database columns, so that are the concepts you need. In addition you probably want ranges (of cells), and sheets.

A simple view looks only at the values in the spreadsheets, not the underlying formulas. Exporting the spreadsheet as tab-separated text gives you access to that. If you need access to the formulas, you're better of with the xml representation, either the XML-spreadsheet, or the Office XML format.

You might be able to come up with a DSL in Excel. That could allow your smarter users to do (part of) the mapping.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文