php cron 作业可以运行多长时间/我做得对吗？

发布于 2024-12-07 12:51:45 字数 695 浏览 0 评论 0原文

我创建了一个 php/mysql scraper，运行良好，但不知道如何最有效地将其作为 cron 作业运行。

有 300 个网站，每个网站有 20 - 200 个页面被抓取。抓取所有站点需要 4 - 7 小时（取决于网络延迟和其他因素）。刮刀需要每天完整运行一次。

我应该将其作为 1 个 cron 作业运行，运行整个 4 - 7 小时，还是每小时运行 7 次，或者每 10 分钟运行一次直到完成？

该脚本设置为从 cron 运行，如下所示：

while($starttime+600 > time()){
   do_scrape();
}

它将运行 do_scrape() 函数，该函数一次抓取 10 个 url，直到（在本例中）600 秒过去。 do_scrape 可能需要 5 - 60 秒才能运行。

我在这里问是因为我在网上找不到任何关于如何运行它的信息，并且我对每天运行它持谨慎态度，因为 php 并不是真正设计为作为单个脚本运行 7 小时的。

我用 vanilla PHP/mysql 编写了它，它运行在只安装了lighttpd/mysql/php5 的精简版 debian VPS 上。我已经以 6000 秒（100 分钟）的超时运行它，没有任何问题（服务器没有崩溃）。

任何有关如何完成这项任务的建议都将受到赞赏。我应该注意什么等等？或者我执行这一切都是错误的？

谢谢！

原文

I have created a php/mysql scraper, which is running fine, and have no idea how to most-efficiently run it as a cron job.

There are 300 sites, each with between 20 - 200 pages being scraped. It takes between 4 - 7 hours to scrape all the sites (depending on network latency and other factors). The scraper needs to do a complete run once daily.

Should I run this as 1 cron job which runs for the entire 4 - 7 hours, or run it every hour 7 times, or run it every 10 minutes until complete?

The script is set up to run from the cron like this:

while($starttime+600 > time()){
   do_scrape();
}

Which will run the do_scrape() function, which scrapes 10 urls at a time, until (in this case) 600 seconds has passed. The do_scrape can take between 5 - 60 seconds to run.

I am asking here as I cant find any information on the web about how to run this, and am kind of wary about getting this running daily, as php isnt really designed to be run as a single script for 7 hours.

I wrote it in vanilla PHP/mysql, and it is running on cut down debian VPS with only lighttpd/mysql/php5 installed. I have run it with a timeout of 6000 seconds (100 minutes) without any issue (the server didnt fall over).

Any advice on how to go about this task is appreciated. What should I be watching out for etc..? or am i going about executing this all wrong?

Thanks!

分享到QQ

分享到微博