Excel文件转换工具? (交换列、基本字符串操作等)
我需要将表格数据导入到我的数据库中。数据通过多方电子表格(主要是 Excel 文件)提供。每个文件的格式相似但不相同,需要进行各种转换才能将数据调整为适合导入的最终格式。此外,输入格式将来可能会发生变化。我正在寻找一种可以由普通用户运行和管理的工具来转换输入文件。
现在让我列出我想要做的一些转换:
- 交换列:
输入是:
|Name|Category|Price|
|data|data |data |
输出是
|Name|Price|Category|
|data|data |data |
- 重命名列
输入是:
|PRODUCTNAME|CAT |PRICE|
|data |data|data |
输出是
|Name|Category|Price|
|data|data |data |
- 根据查找表映射列,如上面的示例所示: 将类别基本数学列中出现的所有字符串“Car”替换为“automobile”
- : 将价格列乘以某个因子
- 基本字符串操作 假设价格列的格式是“3 x $45”,我想将其分成两列,
- 按值对行进行金额和价格过滤:排除包含“昂贵”一词的所有行
- 等。
我有以下内容要求:
- 它可以在以下任何平台上运行:Windows、Mac、Linux
- 开源、免费软件、共享软件或商业软件
- 如果该工具需要最终用户培训才能使用,则转换需要可通过 GUI 进行编辑,
- 问题
- 而这不是它可以处理的 大约 1000-50000 行
基本上我正在寻找一个图形工具,它将帮助用户标准化数据,以便可以导入数据,而无需我编写一堆适配器。
你用什么工具来解决这个问题?
I need to import tabular data into my database. The data is supplied via spreadsheets (mostly Excel files) from multiple parties. The format of each of these files is similar but not the same and various transformations will be necessary to massage the data into the final format suitable for import. Furthermore the input formats are likely to change in the future. I am looking for a tool that can be run and administered by regular users to transform the input files.
Now let me list some of the transformations I am looking to do:
- swap columns:
Input is:
|Name|Category|Price|
|data|data |data |
Output is
|Name|Price|Category|
|data|data |data |
- rename columns
Input is:
|PRODUCTNAME|CAT |PRICE|
|data |data|data |
Output is
|Name|Category|Price|
|data|data |data |
- map columns according to a lookup table, like in the above examples:
replace every occurrence of the string "Car" by "automobile" in the column Category - basic maths:
multiply the price column by some factor - basic string manipulations
Lets say that the format of the Price column is "3 x $45", I would want to split that into two columns of amount and price - filtering of rows by value: exclude all rows containing the word "expensive"
- etc.
I have the following requirements:
- it can run on any of these platform: Windows, Mac, Linux
- Open Source, Freeware, Shareware or commercial
- the transformations need to be editable via a GUI
- if the tool requires end user training to use that is not an issue
- it can handle on the order of 1000-50000 rows
Basically I am looking for a graphical tool that will help the users normalize the data so it can be imported, without me having to write a bunch of adapters.
What tools do you use to solve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
恕我直言,最简单的解决方案是使用 Excel 本身 - 您将免费获得所有 Excel 内置函数和宏。
将转换代码放在宏中,通过电子表格上的 Excel 控件(针对 GUI 方面)调用该宏。找到一种方法将该电子表格和宏插入客户的 Excel 文件中。这样您就不必担心平台兼容性(这是他们的文件,因此他们必须能够打开它)以及其他所有问题。其他要求也得到满足。唯一的培训是向他们展示如何启用宏。
The simplest solution IMHO would be to use Excel itself - you'll get all the Excel built-in functions and macros for free.
Have your transformation code in a macro that gets called via Excel controls (for the GUI aspect) on a spreadsheet. Find a way to insert that spreadsheet and macro in your client's Excel files. That way you don't need to worry about platform compatibility (it's their file, so they must be able to open it) and all the rest. The other requirements are met as well. The only training would be to show them how to enable macros.
Mule 数据集成器将从 csv 文件完成所有这些工作。因此,您可以将电子表格导出为 CSV 文件,并将 CSV 文件加载到 MDI。它甚至可以将数据直接加载到数据库中。用户可以指定您请求的所有转换。 MDI 在非 Mule 环境中可以正常工作。您可以在这里找到它 mulesoft.com (免责声明,我公司开发了该产品所基于的转换技术)。
The Mule Data Integrator will do all of this from a csv file. So you can export your spreadsheet to a CSV file, and load the CSV file ito the MDI. It can even load the data directly to the database. And the user can specify all of the transformations you requested. The MDI will work fine in non-Mule environments. You can find it here mulesoft.com (disclaimer, my company developed the transformation technology that this product is based on).
您没有说您要导入哪个数据库,或者您使用什么工具。如果您使用的是 SQL Server,那么我建议使用 SQL Server Integration Services (SSIS) 在导入过程中操作电子表格。
You didn't say which database you're importing into, or what tool you use. If you were using SQL Server, then I'd recommend using SQL Server Integration Services (SSIS) to manipulate the spreadsheets during the import process.
我倾向于使用 MS Access 作为多个数据源和目标之间的管道 - 但您正在寻找更自动化的东西。您可以将宏和 VB 脚本与 Access 结合使用来帮助完成许多基础知识。
然而,您总是会遇到数据一致性问题,因为用户错误地解释了如何规范化他们的信息。祝你好运!
I tend to use MS Access as a pipeline between multiple data sources and destinations - but you're looking for something a little more automated. You can use macros and VB script with Access to help through a lot of the basics.
However, you're always going to have data consistency problems with users mis-interpreting how to normalize their information. Good luck!