PHP MySQL批量运行查询
我是处理大量数据的新手。我想知道批量查询数据库时是否有任何最佳实践,或者是否有人可以提供任何建议。
我有一个查询将提取所有数据,并使用 PHP 将数据写入 XML 文件。数据可能在 10 到 500,000 行之间,因此我编写了脚本以 50 行为一组提取数据,写入文件,然后获取接下来的 50 行,将其附加到文件等。这样可以吗?或者我应该做点别的事情?我可以增加批处理大小还是应该减少它以使脚本运行得更快?
任何建议将不胜感激。
I am new to working with large amounts of data. I am wondering if there are any best practices when querying a database in batches or if anyone can give any advice.
I have a query that will pull out all data and PHP is used to write the data to an XML file. There can be anywhere between 10 and 500,000 rows of data and I have therefore witten the script to pull the data out in batches of 50, write to the file, then get the next 50 rows, append this to the file etc. Is this OK or should I be doing something else? Could I increase the batch size or should I decrease it to make the script run faster?
Any advice would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
是的,为了获得巨大的结果,建议使用批处理(性能和内存原因)。
这是批量运行查询的基准和示例代码
Yes, for huge results it is recommended to use batches (performance and memory reasons).
Here is benchmark and example code of running query in batches
做到这一点的最佳方法取决于几个不同的因素。最重要的是创建此 XML 文件的时间和原因。
如果您按需创建 XML 文件,并且用户正在等待该文件,那么您将需要进行一些微调和性能测试。
如果它是定期创建的东西,可能是每晚或每小时的任务,然后在构建后请求 XML 文件(类似于 RSS 提要构建器),那么如果您拥有的东西有效,我建议不要弄乱它。
就性能而言,有很多不同的东西可以提供帮助。在脚本中放入一些简单的计时器,并计算每批的记录数,看看是否存在任何性能差异。
如果问题是用户反馈,您可以考虑使用 AJAX 来启动每个批次并向用户报告进度。如果您向用户提供反馈,他们通常会乐意等待比仅仅等待页面整体刷新更长的时间。
另外,检查您的 SQL 查询以确保其中没有隐藏的性能损失。 http://dev.mysql.com/doc/refman/5.0/en /explain.html EXPLAIN 可以向您展示 MySQL 如何处理您的查询。
在极端情况下,我认为可以通过并行处理来实现最佳性能。我还没有在 PHP 中使用过它,但这是主要参考 http: //www.php.net/manual/en/refs.fileprocess.process.php
根据您的托管环境,您可以找到记录总数并将其拆分到子进程中。每个构建自己的 XML 片段。然后你可以组合这些片段。因此进程 1 可以处理记录 0 到 99,进程 2 可以处理 100 到 199 等。
The best way to do this depends on a couple of different things. Most importantly is when and why you are creating this XML file.
If you are creating the XML file on demand, and a user is waiting for the file then you'll need to do some fine tuning and testing for performance.
If it's something that's created on a regular basis, maybe a nightly or hourly task, and then the XML file is requested after it's built (something like an RSS feed builder) then if what you have works I would recommend not messing with it.
As far as performance, there are different things that can help. Put in some simple timers into your scripts and play with the number of records per batch and see if there is any performance differences.
If the issue is user feedback, you may consider using AJAX to kick off each batch and report progress to the user. If you give the user feedback, they'll usually be happy to wait longer than if they're just waiting on the page to refresh in whole.
Also, check your SQL query to make sure there's no hidden performance penalties there. http://dev.mysql.com/doc/refman/5.0/en/explain.html EXPLAIN can show you how MySQL goes about processing your queries.
At an extreme, I'd imagine the best performance could be accomplished through parallel processing. I haven't worked with it in PHP, but here's the primary reference http://www.php.net/manual/en/refs.fileprocess.process.php
Depending on your hosting environment you could find the total number of records and split it among sub processes. Each building their own XML fragments. Then you could combine the fragments. So process 1 may handle records 0 to 99, process 2 100 to 199, etc.
解决这个问题的最佳方法是将其安排为 CRON 作业,我认为这是 PHP 中批处理的最佳解决方案。检查此链接以获取更多信息! PHP 中的批处理。希望这有帮助。
The best way to go about this is to schedule it as a CRON job, which i think is the best solution for batch processing in PHP. check this link for more info! Batch Processing in PHP. Hope this helps.
您会惊讶地发现,一个简单的无限制全选是最快的,
因为它只查询数据库一次,
其他一切都在本地处理
You would be surprised ONE simple select all without limit is the fastest,
because it only query database once,
everything else is processed locally