一天中特定时间的 cron 作业 - 限制是什么?

发布于 2024-11-05 20:52:29 字数 383 浏览 1 评论 0原文

我正在寻求有关在 PHP 中使用 Cron 作业的一些建议。我的情况是这样的:

我有一个拥有大量会员的网站。用户有一个或多个与其帐户关联的 URL。在午夜(或某个时间),我想调用一个脚本,该脚本将查询每个用户的网站并使用它找到的信息更新数据库。将其视为一种屏幕抓取服务。

我的问题是关于服务器的压力。我将在共享服务器上测试这个新功能,但最终我将转移到专用服务器。

因此,如果 c.5000 会员资格每个都有 2 个 URL,那么它将查询 10,000 个网站。人们认为做到这一点的最佳方法是什么?有一个运行前 500 个成员的 cron 作业 - 然后 10 分钟后运行接下来的 500 个成员等等...

或者有一些我没有听说过的魔法可能会有所帮助!?

感谢您的任何提示!

I'm after a little advice around using Cron jobs with PHP. My scenario is this:

I have a website with a large membership. Users have one or several URLS associated with their account. At midnight (or a certain time) I'd like to call a script which will query the websites for each user and update the database with the information it finds. Think of it as a sort of screen scraper service.

My question is around the stress of the server. I'll be testing this new feature on the shared server, but ultimately I will be moving to a dedicated server.

So if the c.5000 membership have 2 URLS each - that's 10,000 websites it would query. What do people think is the best way to do this? Have a cron job that runs the first 500 members - then 10 minutes later run the next 500 etc etc...

or is there some magic which I've not heard of which might help!?

Thanks for any tips!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

枯叶蝶 2024-11-12 20:52:30

正如已经建议的,您可以按顺序一次性运行 URL 脚本。这是最简单的方法。

如果这还不够快,您可以轻松修改 cron 脚本,以便可以调用它在奇数/偶数上运行。从午夜开始运行脚本两次,一次是赔率,一次是偶数,只要您不耗尽计算机上的任何资源,它的运行速度应该是原来的两倍。

在实现方面,我会考虑让脚本接受两个整数值,让您定义模数和余数。例如,对于奇偶数,您定义“2 0”和“2 1”,这将导致类似 SELECT * FROM myTable WHERE id % 2 == 0 和 SELECT * FROM myTable WHERE id % 的结果2 == 1 针对 SQL 数据库执行。使用这种方法可以很容易地配置任意数量的作业以并行运行。

gearmand 非常强大,我已经在许多项目中使用过它,但它的学习曲线更大。我认为我建议的简单解决方案应该可以帮助您。

As suggested already you could run the URL script all in one go sequentially. That's the simplest approach.

If that's not fast enough you could easily modify your cron script so that you can invoke it run on odd/even numbers. Run the script twice starting at midnight, once for odds, once for evens and as long as you don't exhaust any resources on the machine it should run twice as fast.

In terms of implementing this I would consider having the script accept two integer values which let you define the modulus and remainder. E.g. for odd even you define "2 0" and "2 1" which would result in something like SELECT * FROM myTable WHERE id % 2 == 0 and SELECT * FROM myTable WHERE id % 2 == 1 being executed against the SQL database. Using this approach it'd be very easy to configure any number of jobs to run in parallel.

gearmand is very powerful and I have used it on a number of projects but there's a bigger learning curve with it. I think the simple solution I suggested should get you by.

大姐,你呐 2024-11-12 20:52:29

cron 是一个很好的工具,可用于处理此类基本概念。然而,正如您所猜测的,它的扩展性很差!研究作业处理工具,例如开源(和多语言)Gearman:

http://gearman.org/

对于手头的任务来说,这应该是一个更强大的系统。

cron is a great tool to use for basic concepts like this. However, it scales poorly, as you've surmised! Look into job processing tools, like the open-source (and multi-language) Gearman:

http://gearman.org/

This should be a more robust system for the task at hand.

金橙橙 2024-11-12 20:52:29

我会每天安排一个脚本,让脚本依次查询一万个网站。只需一个脚本即可循环所有网站并发送请求并一一处理结果。对于这种数字,没有必要变得更加困难,恕我直言。

I would schedule a script daily, let the script query the 10,000 websites just one after another. Just one script that loops over all the websites and send a request and process the results one by one. For this kind of numbers there's no need make in any more difficult, imho.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文