如何在 PHP 中进行长时间的批处理?
当我需要从上传的 CVS 文件更新大约 100000-500000 行的数据库表时,我有批处理。 通常需要20-30分钟,有时更长。
最好的办法是什么? 有什么好的做法吗? 任何建议将不胜感激
谢谢。
I have batch process when i need to update my db table, around 100000-500000 rows, from uploaded CVS file. Normally it takes 20-30 minutes, sometimes longer.
What is the best way to do ? any good practice on that ? Any suggest would be appreciated
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
从 CSV 导入 500.000 行需要 30 分钟?
您是否考虑过让 MySQL 来完成这项艰苦的工作? 有
LOAD DATA INFILE
,它支持处理 CSV 文件:如果文件的形状不完全适合导入到目标表,您可以使用 PHP 预先对其进行转换,或者将其加载到“临时”表中并让 MySQL处理必要的转换——以更快、更方便的方式进行。
作为一个附加选项,似乎可以通过 MySQL Native Driver for PHP (MYSQLND )。 也许您也可以探索该选项。 它将使您能够保持敏捷的 UI 性能。
It takes 30 minutes to import 500.000 rows from a CSV?
Have you considered letting MySQL do the hard work? There is
LOAD DATA INFILE
, which supports dealing with CSV files:If the file is not quite in the right shape to be imported right to the target table, you can either use PHP to transform it beforehand, or LOAD it into a "staging" table and let MySQL handle the necessary transformation — whichever is faster and more convenient.
As an additional option, there seems to be a possibility to run MySQL queries asynchronously through the MySQL Native Driver for PHP (MYSQLND). Maybe you can explore that option as well. It would enable you to retain snappy UI performance.
如果您正在进行大量插入,那么您是否正在进行批量插入? 即像这样:
这将大大加快插入速度。
您可以做的另一件事是在进行更改时禁用索引,然后在完成后让它一次性重建索引。
关于您正在做的事情的更多细节,您可能会得到更多想法
If you're doing a lot of inserts, are you doing bulk inserts? i.e. like this:
That will dramatically speed up inserts.
Another thing you can do is disable indexing while you make the changes, then let it rebuild the indexes in one go when you're finished.
A bit more detail about what you're doing and you might get more ideas
PEAR 有一个名为 Benchmark 有一个 Benchmark_Profiler 类,可以帮助您找到代码中最慢的部分,以便您进行优化。
The PEAR has a package called Benchmark has a Benchmark_Profiler class that can help you find the slowest section of your code so you can optimize.
我们在大型应用程序中具有类似的功能。 我们遇到了将 csv 中的数百万行插入到具有 9 个索引的表中的问题。 经过大量重构,我们发现插入数据的理想方法是使用 mysql LOAD DATA INFILE 命令,在那里进行转换,并将带有多个插入查询的结果复制到实际表中(
INSERT INTO ... SELECT FROM)每个查询仅处理 50k 行左右(这比发出单个插入更好,但 YMMV)。
We had a feature like that in a big application. We had the issue of inserting millions of rows from a csv into a table with 9 indexes. After lots of refactoring we found the ideal way to insert the data was to load it into a [temporary] table with the mysql LOAD DATA INFILE command, do the transformations there and copy the result with multiple insert queries into the actual table (
INSERT INTO ... SELECT FROM
) processing only 50k lines or so with each query (which performed better than issuing a single insert but YMMV).当用户按下所述按钮时,在数据库的表中设置一个标志。 然后让你的 cron 作业检查这个标志。 如果存在,则开始处理,否则不处理。 我适用,您可以使用同一个表来发布某种状态更新(例如 xx% 已完成),以便用户对进度有一些反馈。
When the user presses said button, set a flag in a table in the database. Then have your cron job check for this flag. If it's there, start processing, otherwise don't. I applicable, you could use the same table to post some kind of status update (eg. xx% done), so the user has some feedback about the progress.