处理数据库记录时如何估计适当的批量大小数
假设我有一个处理器,它从一个表转换中获取记录并将它们放入另一个表中。我正在尝试计算可以一次加载到内存中并处理多少条记录。
显然,这取决于很多因素:记录中的数据量、任何 BLOBS?、列数、数据库类型、驱动程序、框架、盒子上有多少可用内存、是否有任何其他内存消耗进程在运行相同的环境等。
表单测试我可以看到它能够一次处理 10000 条记录,但无法(OOM)查询 100000 条记录。
我想我正在寻找一个好的策略来找出适当的批量大小数字。
或者我应该选择一个非常保守且低批量大小的数字,例如 100。并且不用担心任何查询开销。
谢谢。
So let's say I have a processor that takes records from one table transforms and place them into another table. I'm trying to figure how many records I can load into memory and process at once.
Obviously that would depend on a lot of factors: amount of data in the records, any BLOBS?, number of columns, database type, drivers, frameworks, how much memory available on a box, are there any other memory consuming process running in the same environment, etc.
Form tests I can see that it is able to process 10000 records at once but fails (with OOM) to query 100000.
I guess I'm looking for a good strategy to figure out proper batch size number.
Or should I just go with a very conservative and low batch size number like 100. And don't worry about any query overhead.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会进行不同规模的测试,直到你看不到任何改进。你可能会看到类似的东西。
在这种情况下,我会选择 1K 或 100 偏低。
I would perform tests of different sizes until you don't see any improvement. You might see something like.
In which case I would pick 1K or 100 to be on the low side.
使值可手动配置,并运行一些基准测试。将值设置为合理的数字。我发现过去随着批量记录数量的增加,性能的提高变得越来越少。因此,性能从一次 1 条记录跃升至 10 条是戏剧性的,而从 10 条记录跃升到 100 条记录的情况则不太明显,从 100 条记录跃升到 1000 条记录的情况则更不明显,依此类推。
因此,我会运行基准测试来找出合理的,然后将其保留为可配置的,你永远不知道何时需要出于某种奇怪的原因调整它。
Make the value hand-configurable, and run some benchmarks. Set the value to a sensible number. I've found i the past that performance increase becomes less and less as the number of batch records increase. So performance jump from 1 record at a time to 10 is dramatic, while from 10 to 100 is less so, and from 100 to 1000 is even less so, and so on.
So I would run benchmarks to find out what is reasonable, and then leave it configurable, you never know when you need to adjust it for some odd reason.