Cent OS 5 上的 cronjobs 速度慢

发布于 2024-12-12 06:56:39 字数 602 浏览 7 评论 0 原文

我有 1 个 cronjob 每 60 分钟运行一次,但由于某种原因,最近它运行缓慢。

环境:centos5 + apache2 + mysql5.5 + php 5.3.3 / raid 10/10k HDD / 16gig ram / 4 xeon 处理器

这是 cronjob 的作用:

  1. 解析最近 60 分钟的数据

    a) 1个进程解析用户代理并将数据保存到数据库

    b) 1 个进程解析网站上的印象/点击并将其保存到数据库

  2. 从步骤 1 中的数据

    a) 构建一个小型报告并向管理员/企业发送电子邮件

    将报告保存到每日表格中(可在管理部分中找到)

当我运行命令 ps auxf | 时,我现在看到 8 个进程(同一文件)。 grep process_stats_hourly.php(在 stackoverflow 中找到这个命令)

从技术上讲,我应该只有 1 个而不是 8 个。

Cent OS 中有没有任何工具或者我可以做些什么来确保我的 cronjob 每小时运行一次,并且不会与下一个小时重叠一?

谢谢

I have 1 cronjob that runs every 60 minutes but for some reason, recently, it is running slow.

Env: centos5 + apache2 + mysql5.5 + php 5.3.3 / raid 10/10k HDD / 16gig ram / 4 xeon processor

Here's what the cronjob do:

  1. parse the last 60 minutes data

    a) 1 process parse user agent and save the data to the database

    b) 1 process parse impressions/clicks on the website and save them to the database

  2. from the data in step 1

    a) build a small report and send emails to the administrator/bussiness

    b) save the report into a daily table (available in the admin section)

I see now 8 processes (the same file) when I run the command ps auxf | grep process_stats_hourly.php (found this command in stackoverflow)

Technically I should only have 1 not 8.

Is there any tool in Cent OS or something I can do to make sure my cronjob will run every hour and not overlapping the next one?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

瀞厅☆埖开 2024-12-19 06:56:39

您的硬件似乎足以处理此问题。

1) 检查您是否已经有挂起的进程。使用 ps auxf(请参阅 tcurvelo 答案)检查您是否有一个或多个进程占用过多资源。也许您没有足够的资源来运行您的 cronjob。

2) 检查您的网络连接:
如果您的数据库和 cronjob 位于不同的服务器上,您应该检查这两台机器之间的响应时间。也许您遇到网络问题,导致 cronjob 等待网络将包发回。

您可以使用:Netcat, Iperf, mtr 或 < a href="http://www.cisco.com/en/US/tech/tk801/tk36/technologies_tech_note09186a0080094694.shtml" rel="noreferrer">ttcp

3) 服务器配置
您的服务器配置正确吗?你的操作系统、MySQL设置正确吗?我建议阅读这些文章:

http://www3.wiredgorilla.com/content/ view/220/53/

http://www.vr.org/knowledgebase/1002/Optimize-and-disable-default-CentOS-services.html

http://dev.mysql.com/doc/refman/5.1/en/starting-server.html

<一个href="http://www.linux-mag.com/id/7473/" rel="noreferrer">http://www.linux-mag.com/id/7473/

4 )检查您的数据库:
确保您的数据库具有正确的索引并确保您的查询得到优化。阅读这篇关于 explain 命令

文章记录需要花费一些时间来执行,这会影响 cronjob 的其余部分,如果您在循环内有查询,情况会更糟。

阅读这些文章:

http://dev.mysql.com/doc/refman /5.0/en/optimization.html

http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/

http://blog.fedecarg.com/2008/06/12/10-great-articles-for-optimizing-mysql-queries/

5) 跟踪以及优化的 PHP 代码?
确保您的 PHP 代码运行得尽可能快。

阅读这些文章:

http://phplens.com/lens/php-book /optimizing-debugging-php.php

http://code.google.com/speed/articles/optimizing-php.html

http://ilia.ws/archives/12-PHP-Optimization-Tricks.html

验证 cronjob 的一个好方法是跟踪 cronjob 脚本:
根据您的 cronjob 进程,放置一些调试跟踪,包括多少内存、执行最后一个进程花费了多少时间。例如:

<?php

echo "\n-------------- DEBUG --------------\n";
echo "memory (start): " . memory_get_usage(TRUE) . "\n";

$startTime = microtime(TRUE);
// some process
$end = microtime(TRUE);

echo "\n-------------- DEBUG --------------\n";
echo "memory after some process: " . memory_get_usage(TRUE) . "\n";
echo "executed time: " . ($end-$start) . "\n";

通过这样做,您可以轻松地找到哪个进程占用了多少内存以及执行它需要多长时间。

6) 外部服务器/Web 服务调用
您的 cronjob 是调用外部服务器还是 Web 服务?如果是这样,请确保尽快加载它们。如果您从第三方服务器请求数据,并且该服务器需要几秒钟的时间才能返回答案,这将影响您的 cronjob 的速度,特别是如果这些调用处于循环中。

尝试一下,然后让我知道你发现了什么。

Your hardware seems to be good enough to process this.

1) Check if you already have hanging processes. Using the ps auxf (see tcurvelo answer), check if you have one or more processes that takes too much resources. Maybe you don't have enough resources to run your cronjob.

2) Check your network connections:
If your databases and your cronjob are on a different server you should check whats the response time between these two machines. Maybe you have network issues that makes the cronjob wait for the network to send the package back.

You can use: Netcat, Iperf, mtr or ttcp

3) Server configuration
Is your server is configured correctly? Your OS, MySQL are setup correctly? I would recommend to read these articles:

http://www3.wiredgorilla.com/content/view/220/53/

http://www.vr.org/knowledgebase/1002/Optimize-and-disable-default-CentOS-services.html

http://dev.mysql.com/doc/refman/5.1/en/starting-server.html

http://www.linux-mag.com/id/7473/

4) Check your database:
Make sure your database has the correct indexes and make sure your queries are optimized. Read this article about the explain command

If a query with few hundreds thousands of record takes times to execute that will affect the rest of your cronjob, if you have a query inside a loop, even worse.

Read these articles:

http://dev.mysql.com/doc/refman/5.0/en/optimization.html

http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/

http://blog.fedecarg.com/2008/06/12/10-great-articles-for-optimizing-mysql-queries/

5) Trace and optimized PHP code?
Make sure your PHP code runs as fast as possible.

Read these articles:

http://phplens.com/lens/php-book/optimizing-debugging-php.php

http://code.google.com/speed/articles/optimizing-php.html

http://ilia.ws/archives/12-PHP-Optimization-Tricks.html

A good technique to validate your cronjob is to trace your cronjob script:
Based on your cronjob process, put some debug trace including how much memory, how much time it took to execute the last process. eg:

<?php

echo "\n-------------- DEBUG --------------\n";
echo "memory (start): " . memory_get_usage(TRUE) . "\n";

$startTime = microtime(TRUE);
// some process
$end = microtime(TRUE);

echo "\n-------------- DEBUG --------------\n";
echo "memory after some process: " . memory_get_usage(TRUE) . "\n";
echo "executed time: " . ($end-$start) . "\n";

By doing that you can easily find which process takes how much memory and how long it takes to execute it.

6) External servers/web service calls
Is your cronjob calls external servers or web service? if so, make sure these are loaded as fast as possible. If you request data from a third-party server and this server takes few seconds to return an answer that will affect the speed of your cronjob specially if these calls are in loops.

Try that and let me know what you find.

旧伤还要旧人安 2024-12-19 06:56:39

ps 的输出还显示进程何时启动(请参阅STARTED 列)。

$ ps auxf
USER    PID  %CPU %MEM     VSZ    RSS   TTY  STAT  STARTED    TIME   COMMAND
root      2   0.0  0.0       0      0   ?    S     18:55      0:00   [ktrheadd]
                                                   ^^^^^^^
(...)

或者您可以自定义输出:

$ ps axfo start,command
STARTED   COMMAND
18:55     [ktrheadd]
(...)

因此,您可以确定它们是否重叠。

The ps's output also shows when the process have started (see column STARTED).

$ ps auxf
USER    PID  %CPU %MEM     VSZ    RSS   TTY  STAT  STARTED    TIME   COMMAND
root      2   0.0  0.0       0      0   ?    S     18:55      0:00   [ktrheadd]
                                                   ^^^^^^^
(...)

Or you can customize the output:

$ ps axfo start,command
STARTED   COMMAND
18:55     [ktrheadd]
(...)

Thus, you can be sure if they are overlapping.

牵强ㄟ 2024-12-19 06:56:39

您应该在 process_stats_hourly.php 脚本中使用锁定文件机制。不必过于复杂,您可以让 php 将启动进程的 PID 写入 /var/mydir/process_stats_hourly.txt 等文件。因此,如果处理统计数据的时间超过一个小时,并且 cron 启动 process_stats_hourly.php 脚本的另一个实例,它可以检查锁定文件是否已存在,如果存在,则不会运行。

然而,如果每小时脚本确实找到了锁定文件并且无法启动,那么您将面临如何“重新排队”的问题。

You should use a lockfile mechanism within your process_stats_hourly.php script. Doesn't have to be anything overly complex, you could have php write the PID which started the process to a file like /var/mydir/process_stats_hourly.txt. So if it takes longer than an hour to process the stats and cron kicks off another instance of the process_stats_hourly.php script, it can check to see if the lockfile already exists, if it does it will not run.

However you are left with the problem of how to "re-queue" the hourly script if it did find the lock file and couldn't start.

浪菊怪哟 2024-12-19 06:56:39

您可以在运行时间过长的进程之一上使用 strace -p 1234,其中 1234 是相关进程 ID。也许你会明白为什么它这么慢,甚至被阻塞。

You might use strace -p 1234 where 1234 is a relevant process id, on one of the processes which is running too long. Perhaps you'll understand why is it so slow, or even blocked.

听你说爱我 2024-12-19 06:56:39

Cent OS 中是否有任何工具或我可以做的事情来确保我的 cronjob 每小时运行一次并且不会与下一个任务重叠?

是的。 CentOS 的标准 util-linux 软件包为文件系统锁定提供了便利的命令行。正如 Digital Precision 建议的,锁定文件是一种简单的方法同步进程。

尝试按如下方式调用您的 cronjob:

flock -n /var/tmp/stats.lock process_stats_hourly.php || logger -p cron.err 'Unable to lock stats.lock'

您需要编辑路径并根据需要调整 $PATH。该调用将尝试锁定 stats.lock,如果成功则生成统计脚本,否则放弃并记录失败。

或者,您的脚本可以调用 PHP 的 flock() 本身来达到相同的效果,但是 flock(1) 实用程序已经为您准备好了。

Is there any tool in Cent OS or something I can do to make sure my cronjob will run every hour and not overlapping the next one?

Yes. CentOS' standard util-linux package provides a command-line convenience for filesystem locking. As Digital Precision suggested, a lockfile is an easy way to synchronize processes.

Try invoking your cronjob as follows:

flock -n /var/tmp/stats.lock process_stats_hourly.php || logger -p cron.err 'Unable to lock stats.lock'

You'll need to edit paths and adjust for $PATH as appropriate. That invocation will attempt to lock stats.lock, spawning your stats script if successful, otherwise giving up and logging the failure.

Alternatively your script could call PHP's flock() itself to achieve the same effect, but the flock(1) utility is already there for you.

抽个烟儿 2024-12-19 06:56:39

该日志文件多久轮换一次?

日志解析作业突然比平常花费更长的时间,听起来好像日志没有轮换,现在太大了,解析器无法有效处理。

尝试重置日志文件并查看作业是否运行得更快。如果这解决了问题,我建议 logrotate 作为防止将来出现问题的方法。

How often is that logfile rotated?

A log-parsing job suddenly taking longer than usual sounds like the log isn't being rotated and is now too big for the parser to handle efficiently.

Try resetting the logfile and see if the job runs faster. If that solves the problem, I recommend logrotate as a means of preventing the problem in the future.

轮廓§ 2024-12-19 06:56:39

您可以向 cronjob 添加一个步骤来检查上述命令的输出:

ps auxf | grep process_stats_hourly.php

继续循环,直到命令没有返回任何内容,表明该进程没有运行,然后允许执行剩余的代码。

You could add a step to the cronjob to check the output of your above command:

ps auxf | grep process_stats_hourly.php

Keep looping until the command returns nothing, indicating that the process isn't running, then allow the remaining code to execute.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文