当前位置：文江博客话题详情

在 Windows 上运行 PHP 应用程序 - 守护进程还是 cron？

发布于 2024-11-02 15:04:29 字数 266 浏览 4 评论 0原文

我需要一些实施建议。我有一个 MYSQL 数据库，将被远程写入以便在本地处理任务，并且我需要用 PHP 编写的应用程序在这些任务进入时立即执行它们。

但是当然需要告诉我的 PHP 应用程序何时运行。我考虑过使用 cron 作业，但我的应用程序位于 Windows 计算机上。其次，我需要不断地每隔几秒检查一次，而 cron 只能每分钟检查一次。

我想过编写一个 PHP 守护进程，但我正在了解它是如何工作的，以及它是否是一个好主意！

我将不胜感激任何有关最佳方法的建议。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

錯遇了你 2024-11-09 15:04:29

pyCron 是一个很好的 Windows CRON 替代品：

pyCron

由于此任务非常简单，我只需将 pyCron 设置为每分钟运行以下脚本：

set_time_limit(60); // one minute, same as CRON ;)
ignore_user_abort(false); // you might wanna set this to true

while (true)
{
    $jobs = getPendingJobs();

    if ((is_array($jobs) === true) && (count($jobs) > 0))
    {
        foreach ($jobs as $job)
        {
            if (executeJob($job) === true)
            {
                markCompleted($job);
            }
        }
    }

    sleep(1); // avoid eating unnecessary CPU cycles
}

这样，如果计算机出现故障，您将最坏情况下延迟 60 秒。

您可能还想研究信号量或某种锁定策略，例如使用 APC 变量或检查锁定文件是否存在以避免竞争条件，例如使用 APC：

set_time_limit(60); // one minute, same as CRON ;)
ignore_user_abort(false); // you might wanna set this to true

if (apc_exists('lock') === false) // not locked
{
    apc_add('lock', true, 60); // lock with a ttl of 60 secs, same as set_time_limit

    while (true)
    {
        $jobs = getPendingJobs();

        if ((is_array($jobs) === true) && (count($jobs) > 0))
        {
            foreach ($jobs as $job)
            {
                if (executeJob($job) === true)
                {
                    markCompleted($job);
                }
            }
        }

        sleep(1); // avoid eating unnecessary CPU cycles
    }
}

如果您坚持使用 PHP 守护程序，请自己执行以下操作：赞成并放弃这个想法，改用 Gearman。

编辑：我曾经问过一个您可能感兴趣的相关问题：PHP 分布式系统剖析。

pyCron is a good CRON alternative for Windows:

pyCron

Since this task is quite simple I would just set up pyCron to run the following script every minute:

set_time_limit(60); // one minute, same as CRON ;)
ignore_user_abort(false); // you might wanna set this to true

while (true)
{
    $jobs = getPendingJobs();

    if ((is_array($jobs) === true) && (count($jobs) > 0))
    {
        foreach ($jobs as $job)
        {
            if (executeJob($job) === true)
            {
                markCompleted($job);
            }
        }
    }

    sleep(1); // avoid eating unnecessary CPU cycles
}

This way, if the computer goes down, you'll have a worst case delay of 60 seconds.

You might also want to look into semaphores or some kind of locking strategy like using an APC variable or checking for the existence of a locking file to avoid race conditions, using APC for example:

set_time_limit(60); // one minute, same as CRON ;)
ignore_user_abort(false); // you might wanna set this to true

if (apc_exists('lock') === false) // not locked
{
    apc_add('lock', true, 60); // lock with a ttl of 60 secs, same as set_time_limit

    while (true)
    {
        $jobs = getPendingJobs();

        if ((is_array($jobs) === true) && (count($jobs) > 0))
        {
            foreach ($jobs as $job)
            {
                if (executeJob($job) === true)
                {
                    markCompleted($job);
                }
            }
        }

        sleep(1); // avoid eating unnecessary CPU cycles
    }
}

If you're sticking with the PHP daemon do yourself a favor and drop that idea, use Gearman instead.

EDIT: I asked a related question once that might interest you: Anatomy of a Distributed System in PHP.

回复收藏 0 原文

垂暮老矣 2024-11-09 15:04:29

我会建议一些不寻常的事情：你说你需要在数据写入 MySQL 时运行该任务。这意味着 MySQL“知道”应该执行某些操作。
这听起来像是 MySQL 的 UDF sys_exec 的完美场景。

基本上，如果 MySQL 能够在发生问题时调用外部程序，那就太好了。
如果您使用提到的 UDF，您可以从内部执行 php 脚本 - 比方说，INSERT 或 UPDATE 触发器。
另一方面，您可以使其更加资源友好，并创建 MySQL 事件（假设您使用适当的版本），该事件将使用 sys_exec 调用 PHP 脚本，该脚本按预定义的时间间隔执行某些更新 - 这减少了对 Cron 或任何可以按预定义的时间间隔执行某些操作的类似程序。

回复收藏 0 原文

孤独患者 2024-11-09 15:04:29

我绝对不建议为此使用 cronjobs。

cronjobs 是一件好事，对于许多用途来说非常有用且简单，但是当您描述您的需求时，我认为它们会产生比它们带来的好处更多的复杂性。这里有一些需要考虑的事情：

如果工作重叠会发生什么？执行时间超过一分钟？是否有共享资源/死锁/临时文件？ - 最常见的方法是使用锁定文件，如果它在程序开始时就被占用，则停止执行。但该计划在完成之前还必须寻找更多工作。 - 然而，这在 Windows 机器上也会变得复杂，因为据我所知，它们不支持开箱即用的写锁
cronjobs 维护起来很痛苦。如果你想监视它们，你必须实现额外的逻辑，例如检查程序上次运行的时间。然而，如果您的程序仅按需运行，这可能会变得困难。最好的方法是在数据库中添加某种“作业已完成”字段，或者删除已处理的行。
在大多数基于 UNIX 的系统上，cronjobs 现在相当稳定，但是有很多情况可以破坏你的 cronjob 系统。其中大多数是基于人为错误。例如，系统管理员在编辑模式下未正确退出 crontab 编辑器可能会导致所有 cronjobs 被删除。由于上述原因，许多公司也没有适当的监控系统，一旦他们的服务出现问题就会立即通知。此时，通常没有人写下/将哪个 cronjobs 应该运行的版本控制下来，并开始疯狂猜测和重建工作。
当使用外部工具并且环境不是本机 UNIX 系统时，cronjob 维护可能会更加复杂。系统管理员必须了解更多程序，并且可能会出现潜在错误。

老实说，我认为只需从控制台启动并打开一个小脚本就可以了。

<?php
while(true) {
 $job = fetch_from_db();
 if(!$job) { 
    sleep(10) 
 } else {
    $job->process();
 }
}

您还可以在每个循环中触摸文件（修改修改时间戳），并且您可以编写一个 nagios 脚本来检查该时间戳是否已过期，以便您知道您的作业仍在运行...

如果您希望它启动对于系统我推荐一个守护进程。

PS：在我工作的公司中，我们的网站有很多的后台活动（爬行、更新过程、计算等...），当我开始在那里时，cronjobs 真的是一团糟。它们分布在不同的服务器上，负责不同的任务。数据库通过互联网被广泛访问。大量的 nfs 文件系统、samba 共享等用于共享资源。这个地方充满了单点故障、瓶颈和不断发生故障的东西。涉及的技术太多，维护起来非常困难，当某些东西不起作用时，需要几个小时的时间来追踪问题，甚至需要另一个小时来完成该部分应该做的事情。

现在我们有一个统一的更新程序，它负责几乎所有的事情，它在多台服务器上运行，并且它们有一个定义要运行的作业的配置文件。每件事都是从一个执行无限循环的父进程分派的。它易于监控、定制、同步，一切运行顺利。冗余、同步、粒度细。所以它是并行运行的，我们可以根据需要扩展到任意数量的服务器。

我真的建议坐下来足够的时间，从整体上思考一切，并了解整个系统。然后投入时间和精力来实施一个解决方案，该解决方案将在未来发挥良好作用，并且不会在整个系统中传播大量不同的程序。

pps：

我读了很多关于 cronjobs/task 的最小间隔 1/5 分钟的内容。您可以使用接管该间隔的任意脚本轻松解决该问题：

// run every 5 minutes = 300 secs
// desired interval: 30 secs
$runs = 300/30; // be aware that the parent interval needs to be a multiple of the desired interval
for($i=0;$i<$runs;$i++) {
 $start = time();
 system('myscript.php');
 sleep(300/10-time()+$start); // compensate the time that the script needed to run. be aware that you have to implement some logic to deal with cases where the script takes longer to run than your interavl - technique and problem described above
}

i would definately not advise to use cronjobs for this.

cronjobs are a good thing and very useful and easy for many purposes, but as you describe your needs, i think they can produce more complications than they do good. here are some things to consider:

what happens if jobs overlap? one takes longer to execute than one minute? are there any shared resources/deadlocks/tempfiles? - the most common method is to use a lock file, and stop the execution if its occupied right at the start of the program. but the program also has to look for further jobs right before it completes. - this however can also get complicated on windows machines because they AFAIK don't support write locks out of the box
cronjobs are a pain in the ass to maintain. if you want to monitor them you have to implement additional logic like a check when the program last ran. this however can get difficult if your program should run only on demand. the best way would be some sort of "job completed" field in the database or delete rows that have been processed.
on most unix based systems cronjobs are pretty stable now, but there are a lot of situatinos where you can break your cronjob system. most of them are based on human error. for example a sysadmin not exiting the crontab editor properly in edit mode can cause all cronjobs to be deleted. a lot of companies also have no proper monitoring system for the reasons stated above and notice as soon as their services experience problems. at this point often nobody has written down/put under version control which cronjobs should run and wild guessing and reconstruction work begins.
cronjob maintaince can be further complicated when external tools are used and the environment is not a native unix system. sysadmins have to gain knowledge of more programs and they can have potential errors.

i honestly think just a small script that you start from the console and let open is just fine.

<?php
while(true) {
 $job = fetch_from_db();
 if(!$job) { 
    sleep(10) 
 } else {
    $job->process();
 }
}

you can also touch a file (modify modification timestamp) in every loop, and you can write a nagios script that checks for that timestamp getting out of date so you know that your job is still running...

if you want it to start up with the system i recommend a deamon.

ps: in the company i work there is a lot of background activity for our website (crawling, update processes, calculations etc...) and the cronjobs were a real mess when i started there. their were spread over different servers responsible for different tasks. databases were accessed wildly accross the internet. a ton of nfs filesytems, samba shares etc were in place to share resouces. the place was full of single points of failures, bottlenecks and something constantly broke. there were so many technologies involved that it was very difficult to maintain and when something didnt work it needed hours of tracking down the problem and another hour of what that part even was supposed to do.

now we have one unified update program that is responsible for literally everyhing, it runs on several servers and they have a config file that defines the jobs to run. eveyrthing gets dispatched from one parent process doing an infinite loop. its easy to monitor, customice, synchronice and everything runs smoothly. it is redundant, it is syncrhonized and the granularity is fine. so it runs parallel and we can scale up to as many servers as we like.

i really suggest to sit down for enough time and think about everything as a whole and get a picture of the complete system. then invest the time and effort to implement a solution that will serve fine in future and doesnt spread tons of different programs throughout your system.

pps:

i read a lot about the minimum interaval of 1/5 minutes for cronjobs/tasks. you can easily work around that with an arbitrary script that takes over that interval:

// run every 5 minutes = 300 secs
// desired interval: 30 secs
$runs = 300/30; // be aware that the parent interval needs to be a multiple of the desired interval
for($i=0;$i<$runs;$i++) {
 $start = time();
 system('myscript.php');
 sleep(300/10-time()+$start); // compensate the time that the script needed to run. be aware that you have to implement some logic to deal with cases where the script takes longer to run than your interavl - technique and problem described above
}

回复收藏 0 原文

我是有多爱你 2024-11-09 15:04:29

这看起来像是作业服务器的作业；）看看 Gearman。这种方法的额外好处是，当且仅当有事情要做时，这是由远程端触发的，而不是轮询。特别是在小于（比方说）5 分钟的间隔内，轮询不再有效，具体取决于作业执行的任务。

回复收藏 0 原文

長街聽風 2024-11-09 15:04:29

快速而肮脏的方法是创建一个循环来不断检查是否有新工作。

伪代码

set_ini("max_execution_time", "3600000000"); 
$keeplooping = true;
while($keeplooping){

   if(check_for_work()){
      process_work();
   }
   else{
     sleep(5);
   }

   // some way to change $keeplooping to false
   // you don't want to just kill the process, because it might still be doing something
}

The quick and dirty way is to create a loop that continuously checks if there is new work.

Psuedo-code

set_ini("max_execution_time", "3600000000"); 
$keeplooping = true;
while($keeplooping){

   if(check_for_work()){
      process_work();
   }
   else{
     sleep(5);
   }

   // some way to change $keeplooping to false
   // you don't want to just kill the process, because it might still be doing something
}

回复收藏 0 原文