哪种方式导入电子表格数据更好?

发布于 2024-12-22 12:11:18 字数 202 浏览 1 评论 0原文

我正在尝试使用 Java 将数据从电子表格导入到数据库中。有两种方法可以做到这一点:1)从矛表中读取并提取数据,并将它们组织成数据结构,例如ArrayList、向量或不同对象的映射,以便我可以摆脱冗余条目等,然后编写将数据结构存入数据库。 2)提取数据并在读取和提取单元格时直接将其放入数据库。我认为第一种方法可能更好,但第二种方法会更快吗?我还应该考虑其他因素吗?

感谢。

I am trying to import data from speadsheet into a database using Java. There are two ways that I could do this: 1) Read and extract the data from speardsheets and organize them into data structures, such as ArrayLists, Vectors or maps of different objects, so that I could get rid of redundant entries etc, then write the data structures into the database. 2) Extract the data and put them into the database directly as the cells are read and extracted. I think the first way is probably better but would the second way be faster? Any other considerations i should think of?

Thank.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冷清清 2024-12-29 12:11:18

您想要执行 executeBatch () 与方法 #1 类似。因此,基本上,您从电子表格中读取批量大小(即 1000 条记录)的数据,然后一次将一批事务提交到数据库。之后继续进行下一批,依此类推。通过这种方法,您可以有效地利用数据库,节省网络传输,而且您最终不会在内存中囤积大量数据,这可能会导致内存不足异常。您还应该重复使用相同的连接和准备好的语句对象。

关于数据清理过程,在将数据放入持久存储(例如表)之前,您绝对应该清理数据。您将来可能需要生成报告或在其他应用程序中使用数据,因此拥有 clean &从长远来看,结构良好的表格将对您有所帮助。对于批处理应用程序,性能要求通常不如事务系统那么高。

您还应该使用 apache poi 之类的帮助程序库来读取 Excel 文档。就数据结构而言,它取决于您的数据,但通常 ArrayList 就足够了。

您可能会考虑的另一点是,通常大多数 ETL 工具都提供开箱即用的此类数据加载任务。如果您的情况允许,我强烈建议您使用 Kettle 等 ETL 工具来加载数据。您也许可以节省一些时间并学习新工具。

希望这有帮助!

You would want to do a executeBatch() here which is similar to approach #1. So basically you read data from the spread sheet for a batch size (ie. 1000 records) and then you do a commit for transactions a batch at a time to the DB. After that move on to the next batch and so on and so forth. With this approach you utilize database efficiently, save yourself network trips, and also you do not end up hoarding a lot of data in memory which could lead to out of memory exceptions. You should also re-use the same connection and prepared statement objects.

Regarding the data clean up process, you should definitely sanitize your data before putting into a persistent storage such as a table. You may need to generate reports or use the data in other applications in the future, so having clean & well structured tables will help you in the long run. For batch applications, usually the performance requirements are not as high as the transactional systems.

You should also utilize a helper library like apache poi for reading excel documents. As far as the data structure is concerned it will depend on your data, but generally an ArrayList should suffice here.

Another point you might consider is that ypically most ETL tools offer these kinds of data loading tasks out of the box. If your situation allows for it, I highly recommend looking at an ETL tool like Kettle to load the data. You may be able to save yourself some time and learn a new tool.

Hope this helps!

留蓝 2024-12-29 12:11:18

您可以考虑使用 ETL 工具(提取、转换和加载)来完成您所指的任务类型

You can consider using an ETL tools (Extraction, Transformation and Loading) for the kind of task you are referring

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文