使用 PHP 设计 OOP 应用程序。怎样做?

发布于 2024-10-10 03:57:34 字数 2360 浏览 2 评论 0原文

我是 PHP OOP 的新手。我试图了解如何构建此类应用程序。该应用程序用于抓取大约 100 个不同的网站。

我有一个主类“Scrap”,它处理所有不同网站的全局方法,在“Scripts”文件夹中,我有用于处理我正在抓取的网站的特定方面的类。我有另一个名为“Lib”的文件夹,用于包含外部库。

让我直观地解释一下:

我有这个文件架构:

- Scrap.php
+ Scripts
               - Google.php
               - Yahoo.php
               - Stackoverflow.php
+ Lib
     + libScrap
               - LIB_parse.php
     + phpQuery
               - phpQuery.php
               - others files and folder...

Scrap.php 包含以下内容:

<?php

// Includes
require('/lib/libScrap/LIB_parse.php');
require('/lib/phpQuery/phpQuery.php');

// Testing Scrap
$testing = new Scrap;
$testing->teste = $testing->getPage('http://www.yahoo.com','','off');
echo $testing->teste; 


class Scrap {

    public function __construct() {
        // do things!
    }

    /*
    * This method grabs the entire page(HTML) on given URL
    * Ex: $htmlgrab->teste = $htmlgrab->getPage('http://testing.com/ofertas/','','off');
    * Returns, the HTML of given URL
    */
    public function getPage($site, $proxy, $proxystatus) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
        if ($proxystatus == 'on') {
            curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
            curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
            curl_setopt($ch, CURLOPT_PROXY, $proxy);
        }
        curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
        curl_setopt($ch, CURLOPT_URL, $site);
        ob_start();      // prevent any output
        return curl_exec ($ch); // execute the curl command
        ob_end_clean();  // stop preventing output
        curl_close ($ch);
    }

    /*
    * 
    * 
    */
    public function getLinks() {
        // do things!
    }

    /*
    * This method grabs the page title.
    * Ex: <title>This is the page title</title>
    * Returns, "This is the page title"
    */
    public function getTitle() {
        // do things!
    }

}
?>

在文件夹“Scripts”内,我将拥有如下文件:

<?php
require('../Scrap.php');

class Yahoo extends Scrap {

    public function doSomething() {
        // do things!
    }

}
?>

尾注:我需要调用/实例化在该文件夹中创建的所有类废弃网站的“脚本”。我的疑问是实例化大约 100 个类的最佳方法。

如果你能给我一些关于如何设计这个的线索。

最好的问候,

抱歉我的英语不好。

I'm a newbie in PHP OOP. I'm trying to get how can I structure this kind of application. This application is for Scraping about 100 different websites.

I have a main Class, "Scrap" that handles the methods that are global to all different websites, and inside the folder "Scripts" I have the Classes for handle particular aspects of the website I'm scraping. I have another folder that is called "Lib" to include external libraries.

Let me explain visually:

I have this file schema:

- Scrap.php
+ Scripts
               - Google.php
               - Yahoo.php
               - Stackoverflow.php
+ Lib
     + libScrap
               - LIB_parse.php
     + phpQuery
               - phpQuery.php
               - others files and folder...

The Scrap.php contains the following:

<?php

// Includes
require('/lib/libScrap/LIB_parse.php');
require('/lib/phpQuery/phpQuery.php');

// Testing Scrap
$testing = new Scrap;
$testing->teste = $testing->getPage('http://www.yahoo.com','','off');
echo $testing->teste; 


class Scrap {

    public function __construct() {
        // do things!
    }

    /*
    * This method grabs the entire page(HTML) on given URL
    * Ex: $htmlgrab->teste = $htmlgrab->getPage('http://testing.com/ofertas/','','off');
    * Returns, the HTML of given URL
    */
    public function getPage($site, $proxy, $proxystatus) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
        if ($proxystatus == 'on') {
            curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
            curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
            curl_setopt($ch, CURLOPT_PROXY, $proxy);
        }
        curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
        curl_setopt($ch, CURLOPT_URL, $site);
        ob_start();      // prevent any output
        return curl_exec ($ch); // execute the curl command
        ob_end_clean();  // stop preventing output
        curl_close ($ch);
    }

    /*
    * 
    * 
    */
    public function getLinks() {
        // do things!
    }

    /*
    * This method grabs the page title.
    * Ex: <title>This is the page title</title>
    * Returns, "This is the page title"
    */
    public function getTitle() {
        // do things!
    }

}
?>

And inside the folder "Scripts" I will have files like this one:

<?php
require('../Scrap.php');

class Yahoo extends Scrap {

    public function doSomething() {
        // do things!
    }

}
?>

End note: I need to call/instantiate all the classes created in the folder "Scripts" to Scrap the websites. My doubt is about the best method to instantiate about 100 classes.

If you can give me some clues on how to design this.

Best Regards,

Sorry my bad english.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

中性美 2024-10-17 03:57:34

假设每个类都有自己的源文件,您可以考虑“自动加载”。在我自己的项目中,我使用 spl_autoload_register() 函数来实现这一点而不是使用 __autoload()。

Assuming each class has it's own source file you could consider 'autoloading'. In my own projects I make use of the spl_autoload_register() function to achieve this instead of using __autoload().

去了角落 2024-10-17 03:57:34

如果您想包含脚本文件夹中的所有文件,为什么一个简单的循环就不够了?我经常对包含许多脚本的项目做同样的事情。

$arr = glob ('scripts/*.php');
foreach ($arr as $script)
    include_once ($script);

更新

至于初始化每个对象...最好的选择可能是在每个类中声明一个对象...就像这样...

<?php
require('../Scrap.php');

class Yahoo extends Scrap {

    public function doSomething() {
        // do things!
    }
}

$yahooObj = new Yahoo(); //This is the addition

?>

这样,在您调用 include_once( 'yahoo.php') 您还将获得 $yahooObj 对象。

你挖吗?

If you want to include all of the files in the script folder, why wouldn't a simple loop suffice? I frequently do the same thing with projects containing many script.

$arr = glob ('scripts/*.php');
foreach ($arr as $script)
    include_once ($script);

UPDATE

As far as initializing each object... the best bet would probably be to declare an object inside of each class... like this...

<?php
require('../Scrap.php');

class Yahoo extends Scrap {

    public function doSomething() {
        // do things!
    }
}

$yahooObj = new Yahoo(); //This is the addition

?>

This way, after you call include_once('yahoo.php') you will also get the $yahooObj object.

Ya dig?

陪你搞怪i 2024-10-17 03:57:34

您可能正在寻找的模式是策略命令

至于设置刮刀,您有多种选择。您可以将抓取工具的路径硬编码到主 Scrape 类中,或者从配置文件加载它们,或者使用自动加载或类映射,或者使用 Factory 或这些的组合。这真的取决于你。更重要的是决定什么最适合您的应用程序。

如果您已经在使用自动加载,请确保可以找到您的抓取工具。如果您想添加额外的自动加载器,请这样做。如果您更喜欢使用类映射来提高安全性和速度,则可以使用类映射等。讨论所有利弊超出了这个问题的范围。如果您对此感兴趣,请查看这篇关于自动加载基准的博客文章(适用于 ZF2,但普遍适用)。

因为没有一个刮擦可能相互依赖。我建议研究类似 Gearman 在不同进程中异步运行这些而不是在同一个脚本中按顺序执行。然后,您的主脚本将仅创建具有适当设置的必要工作人员,并让它们在后台进程中运行。手册中有一些示例,这里是与基准测试文章来自同一站点的另一篇文章

The patterns you are likely looking for are Strategy or Command.

As for setting up the scrapers, well, you have various options. You can hardcode the path to the scrapers into the main Scrape class or load them from a config file or use autoloading or a class map or use a Factory or a combination of these. That's really up to you. It's much more a matter of deciding what works well with your application.

If you are already using autoloading, then make sure your Scrapers can be found. If you want to add an additional autoloader, do so. If you prefer a classmap for the added security and speed, then use a classmap and so on. Discussing all the pros and cons is beyond scope of this question. If you are interested in this, have a look at this blog post about autoloading benchmarks (it's for ZF2 but generally applicable).

Since none of the scrapings are likely to depend on each other. I suggest to look into something like Gearman to run these asynchronously in different processes instead of sequentially in the same script. Your main script would then only create the necessary Workers with the appropriate settings and let them run in a background process. There is some examples in the manual and here is another one from the same site as the benchmarking article

鱼忆七猫命九 2024-10-17 03:57:34

我建议你这样命名你的 Scrap 类:

Scrap_Yahoo
Scrap_Google
...

然后你按照 Dutchie432 的建议去做:

$scraps = array();
foreach (glob('scripts/*.php') as $script) {
  $scrap = 'Scrap_' . pathinfo($script, PATHINFO_FILENAME);
  require_once($script);
  $scraps[] = new $scrap();
} 

然后你可以用这个 scraps 数组/工厂做任何你想做的事情:

foreach ($scraps as $scrap) {
  $scrap->scrap();
}

然后你应该定义一个抽象方法 scrap() 在您的 Scrap 类中,不要忘记使该类抽象:

// file: Scrap.php
abstract class Scrap {
  public abstract scrap();
}

scripts/* 目录中的每个类都将扩展类 Scrap 并仅定义这一特定方法<代码> scrap() 。

您可以更进一步,实现模板方法设计模式。

I would suggest you to name your Scrap classes like this:

Scrap_Yahoo
Scrap_Google
...

Then you do what Dutchie432 suggested:

$scraps = array();
foreach (glob('scripts/*.php') as $script) {
  $scrap = 'Scrap_' . pathinfo($script, PATHINFO_FILENAME);
  require_once($script);
  $scraps[] = new $scrap();
} 

Then you can do whatever you want with this array/factory of scraps:

foreach ($scraps as $scrap) {
  $scrap->scrap();
}

Then you should define an abstract method scrap() in your Scrap class, and don't forget to make the class abstract:

// file: Scrap.php
abstract class Scrap {
  public abstract scrap();
}

Every class in scripts/* directory will extend class Scrap and defined just this one particular method scrap().

You can go even further and implement Template Method design pattern.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文