当表列太小时如何处理 Spring JDBC 的批量 SQL 插入

发布于 2024-11-09 14:55:19 字数 437 浏览 0 评论 0原文

我正在尝试将大量数据批量插入到 MYSQL 数据库中,并且需要有关如何处理表列不足以应对传入数据大小的情况的建议。

由于数据集非常大(超过 1Gb),我正在将输入数据流式读取到内存中。在某些时候,我“刷新”数据并尝试使用 jdbcTemplate.batchUpdate 将其插入表中。目标表的所有列都是 VARCHAR,并且假定大小为 50。这对于绝大多数数据来说完全足够了。然而,由于偶尔到达的数据大于大小 15,INSERT 将失败(数据对于列来说太长)。

解决这个问题的最佳方法是什么?我不想盲目地增大所有表列,因为这似乎是一种扩展性不佳的策略。由于数据的大小,我也不想预先解析数据。那么我应该等待失败,然后执行适当的“ALTER TABLE”并重新提交数据吗?是否可以缓存失败的项目,确定所需的列大小并仅重新提交失败的项目?

我正在寻找有关执行此操作的最佳方法的指示和一般建议。

谢谢。

I'm trying to batch insert a large volume of data into a MYSQL database and need advice about how to handle the situation where a table column is not large enough to cope with the size of the incoming data.

Since the data set is very large (over 1Gb) I'm performing a streaming read of the input data into memory. At certain points, I "flush" the data and try to insert it into the table using jdbcTemplate.batchUpdate . All of the columns of the destination table are VARCHARs and are assumed to be of size 50. This is perfectly adequate for the vast majority of the data. However since occasionally data arrives which is larger than size 15, the INSERT will fail (Data too long for column).

What is the best approach to tackling this issue? I don't want to blindly make all of the table columns larger since this seems like a strategy which won't scale very well. I'd also prefer not to pre-parse the data due to its size. So should I wait for the fail, then perform an appropriate "ALTER TABLE" and re-submit the data? Is it possible to cache away the failed items, determine the required column size and re-submit failed items only?

I'm looking for pointers and general advice about the optimal way to perform this.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

余生再见 2024-11-16 14:55:19

我相信您最好的选择是在将数据放入插入批次之前验证数据。这样您就可以拒绝字符串太长的对象或将字符串截断到所需的大小。

I believe your best bet is validating data before putting it into the insert batch. That way you can either reject object with too long strings or truncate strings to the required size.

请远离我 2024-11-16 14:55:19

您表明您正在使用 MySQL 和 varchar 列。我不清楚当前列宽是 15 还是 50,但无论哪种情况,将列定义设置为 varchar(255) 都不会造成性能损失。我认为这将使许多失败案例取得成功。

您没有指出要求是否是所有数据都必须不加更改地最终存储在数据库中,或者您是否可以截断或修改数据以使其适合。假设您可以更改数据,这里有一些额外的想法:

对于处理仍然太长的数据,我喜欢 Olaf 的建议,即在包含在批处理中之前验证数据。我会在读取时验证每条记录,然后决定是否将其添加到插入批次或将其作为失败处理(存储在文件中?)。

此外,您并没有真正指出在读取数据之后和插入数据库之前对数据进行了多少处理。如果你没有做任何事情,那么也许你应该研究一下可用于 MySQL 的批量加载工具(我不太熟悉,无法告诉你)。通常这些工具能够为您处理这些类型的案例。

You indicate that you're using MySQL and varchar columns. It isn't clear to me if the current column width is 15 or 50, but in either case there is no performance penalty to making the column definitions varchar(255). I assume that would allow many of the failure cases to succeed.

You didn't indicate whether the requirement is that all data must end up in the database unaltered, or if you can truncate or modify the data to allow it to fit. Assuming you can alter the data, here are a couple of additional thoughts:

For handling data that is still too long, I like Olaf's suggestion of validating the data before including in the batch. I would validate each record as it is read and then decide whether to add it to the insert batch or process it as a failure (store in file?).

Also, you don't really indicate how much processing you're doing on the data after reading and before insering in the database. If you aren't doing any, then perhaps you should look into bulk loading tools available for MySQL (I'm not familiar enough to tell you). Typically these tools are able to handle these types of cases for you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文