如何在 PHP/MySQL 应用程序中充分利用多核 CPU?

发布于 2024-08-21 19:13:33 字数 1702 浏览 6 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

若有似无的小暗淡 2024-08-28 19:13:33

简介

PHP 具有完整的多线程支持,您可以通过多种方式充分利用它。已经能够在不同的示例中演示这种多线程能力:

快速搜索 将提供额外资源。

类别

1:MySQL 查询

MySQL 是完全多线程的,并且将利用多个 CPU,只要操作系统支持它们,如果正确配置性能,它还可以最大限度地利用系统资源。

my.ini 中影响线程性能的典型设置是:

thread_cache_size = 8

thread_cache_size 以提高性能。通常,如果您有良好的线程实现,这不会提供显着的性能改进。但是,如果您的服务器每秒看到数百个连接,您通常应该将 thread_cache_size 设置得足够高,以便大多数新连接使用缓存线程

如果您使用的是 Solaris 那么你可以使用

thread_concurrency = 8 

thread_concurrency 使应用程序能够向线程系统提供有关应同时运行的所需线程数的提示。

该变量从 MySQL 5.6.1 开始已弃用,并在 MySQL 5.7 中删除。只要您看到它,就应该从 MySQL 配置文件中删除它,除非它们适用于 Solaris 8 或更早版本。

InnoDB:

如果您使用Innodb 有存储引擎,因为它完全支持线程并发。

innodb_thread_concurrency //  Recommended 2 * CPUs + number of disks
 

您还可以查看 innodb_read_io_threadsinnodb_write_io_threads,其中默认值是4,根据硬件的不同,它可以增加到 64

其他:

其他需要考虑的配置包括 key_buffer_size< /code> 、 table_open_cachesort_buffer_size 等,这些都会带来更好的性能

PHP:

在纯 PHP 中,您可以创建 MySQL Worker,其中每个查询在单独的 PHP 线程中执行

$sql = new SQLWorker($host, $user, $pass, $db);
$sql->start();

$sql->stack($q1 = new SQLQuery("One long Query")); 
$sql->stack($q2 = new SQLQuery("Another long Query"));

$q1->wait(); 
$q2->wait(); 
             
// Do Something Useful

这是 SQLWorker 的完整工作示例

2:HTML内容解析

我怀疑这个任务花费了大量的计算时间。

如果您已经知道问题,那么通过事件循环、作业队列或使用线程来解决会更容易。

一次处理一个文档可能是一个非常非常缓慢且痛苦的过程。 @ka 曾经使用 ajax 破解了调用多个请求的方法,一些有创造力的人只会使用 pcntl_fork 但如果您使用的是 windows 那么您可以不利用 pcntl

由于 pThreads 同时支持 Windows 和 Unix 系统,您就没有这样的限制。就像..一样简单,如果您需要解析 100 个文档?生成 100 个线程...简单

HTML 扫描

// Scan my System
$dir = new RecursiveDirectoryIterator($dir, RecursiveDirectoryIterator::SKIP_DOTS);
$dir = new RecursiveIteratorIterator($dir);

// Allowed Extension
$ext = array(
        "html",
        "htm"
);

// Threads Array
$ts = array();

// Simple Storage
$s = new Sink();

// Start Timer
$time = microtime(true);

$count = 0;
// Parse All HTML
foreach($dir as $html) {
    if ($html->isFile() && in_array($html->getExtension(), $ext)) {
        $count ++;
        $ts[] = new LinkParser("$html", $s);
    }
}

// Wait for all Threads to finish
foreach($ts as $t) {
    $t->join();
}

// Put The Output
printf("Total Files:\t\t%s \n", number_format($count, 0));
printf("Total Links:\t\t%s \n", number_format($t = count($s), 0));
printf("Finished:\t\t%0.4f sec \n", $tm = microtime(true) - $time);
printf("AvgSpeed:\t\t%0.4f sec per file\n", $tm / $t);
printf("File P/S:\t\t%d file per sec\n", $count / $tm);
printf("Link P/S:\t\t%d links per sec\n", $t / $tm);

输出

Total Files:            8,714
Total Links:            105,109
Finished:               108.3460 sec
AvgSpeed:               0.0010 sec per file
File P/S:               80 file per sec
Link P/S:               907 links per sec

使用的类

Sink

class Sink extends Stackable {
    public function run() {
    }
}

LinkParser

class LinkParser extends Thread {

    public function __construct($file, $sink) {
        $this->file = $file;
        $this->sink = $sink;
        $this->start();
    }

    public function run() {
        $dom = new DOMDocument();
        @$dom->loadHTML(file_get_contents($this->file));
        foreach($dom->getElementsByTagName('a') as $links) {
            $this->sink[] = $links->getAttribute('href');
        }
    }
}

<尝试

在没有线程的情况下解析具有 105,109 链接的 8,714 文件,看看需要多长时间。

更好的架构

产生太多线程,这在生产中并不是一件明智的事情。更好的方法是使用 池化。有一个定义 Workers 的池,然后 堆栈任务

性能改进

很好,上面的例子仍然可以改进。您可以使用多个线程来扫描我的系统中的文件,然后将数据堆栈到 Workers 中进行处理,而不是等待系统在单个线程中扫描所有文件

3:搜索索引更新

第一个答案已经基本回答了这个问题,但是提高性能的方法有很多。您是否考虑过基于事件的方法?

介绍活动

@rdlowrey 引用 1:

好吧,这样想吧。想象一下,您需要在 Web 应用程序中为 10,000 个同时连接的客户端提供服务。传统的每个请求一个线程每个请求一个进程服务器不是一个选择,因为无论您的线程有多轻量级,您仍然无法保持其中 10,000 个打开一次。

@rdlowrey 引用2:

另一方面,如果将所有套接字保留在单个进程中并监听这些套接字变得可读或可写,则可以将整个服务器放入单个事件循环中,并仅在有内容可读时才对每个套接字进行操作/写。

为什么不尝试使用事件驱动非阻塞 I/O 方法来解决您的问题。 PHP 有 libevent 来增强您的应用程序。

我知道这个问题都是多线程,但如果你有时间,你可以看看这个核反应堆由 @igorw 用 PHP 编写

最后

考虑

我认为你应该考虑使用 Cache 和 Job Queue 用于某些任务。您可以轻松收到一条消息,说“

Document uploaded for processing ..... 5% - Done   

然后在后台执行所有浪费时间的任务”。请查看缩小大型处理作业以获取类似的信息案例研究。

分析

分析工具?从 XdebugYslow 都非常有用。例如。 Xdebug 在处理线程时没有什么用处,因为它不受支持

我没有最喜欢的

Introduction

PHP has full Multi-Threading support which you can take full advantage of in so many ways. Have been able to demonstrate this Multi-Threading ability in different examples:

A quick Search would give additional resources.

Categories

1: MySQL queries

MySQL is fully multi-threaded and will make use of multiple CPUs, provided that the operating system supports them, It would also maximize system resources if properly configured for performance.

A typical setting in the my.ini that affect thread performance is :

thread_cache_size = 8

thread_cache_size can be increased to improve performance if you have a lot of new connections. Normally, this does not provide a notable performance improvement if you have a good thread implementation. However, if your server sees hundreds of connections per second you should normally set thread_cache_size high enough so that most new connections use cached threads

If you are using Solaris then you can use

thread_concurrency = 8 

thread_concurrency enables applications to give the threads system a hint about the desired number of threads that should be run at the same time.

This variable is deprecated as of MySQL 5.6.1 and is removed in MySQL 5.7. You should remove this from MySQL configuration files whenever you see it unless they are for Solaris 8 or earlier.

InnoDB: :

You don't have such limitations if you are using Innodb has the storage engine because it full supports thread concurrency

innodb_thread_concurrency //  Recommended 2 * CPUs + number of disks
 

You can also look at innodb_read_io_threads and innodb_write_io_threads where the default is 4 and it can be increased to as high as 64 depending on the hardware

Others:

Other configurations to also look at include key_buffer_size , table_open_cache, sort_buffer_size etc. which cal all result in better performance

PHP:

In pure PHP you can create MySQL Worker where each query are executed in separate PHP threads

$sql = new SQLWorker($host, $user, $pass, $db);
$sql->start();

$sql->stack($q1 = new SQLQuery("One long Query")); 
$sql->stack($q2 = new SQLQuery("Another long Query"));

$q1->wait(); 
$q2->wait(); 
             
// Do Something Useful

Here is a Full Working Example of SQLWorker

2: HTML content parsing

I suspect that a great deal of computation time is spent in this task.

If you already know the problem then it makes it easier to solve via event loops , Job Queue or using Threads.

Working on one document one at a time can be a very, very slow, painful process. @ka once hacked his way out using ajax to calling multiple request, Some Creative minds would just fork the process using pcntl_fork but if you are using windows then you can not take advantage of pcntl

With pThreads supporting both windows and Unix systems, You don't have such limitation. Is as easy as .. If you need to parse 100 document? Spawn 100 Threads ... Simple

HTML Scanning

// Scan my System
$dir = new RecursiveDirectoryIterator($dir, RecursiveDirectoryIterator::SKIP_DOTS);
$dir = new RecursiveIteratorIterator($dir);

// Allowed Extension
$ext = array(
        "html",
        "htm"
);

// Threads Array
$ts = array();

// Simple Storage
$s = new Sink();

// Start Timer
$time = microtime(true);

$count = 0;
// Parse All HTML
foreach($dir as $html) {
    if ($html->isFile() && in_array($html->getExtension(), $ext)) {
        $count ++;
        $ts[] = new LinkParser("$html", $s);
    }
}

// Wait for all Threads to finish
foreach($ts as $t) {
    $t->join();
}

// Put The Output
printf("Total Files:\t\t%s \n", number_format($count, 0));
printf("Total Links:\t\t%s \n", number_format($t = count($s), 0));
printf("Finished:\t\t%0.4f sec \n", $tm = microtime(true) - $time);
printf("AvgSpeed:\t\t%0.4f sec per file\n", $tm / $t);
printf("File P/S:\t\t%d file per sec\n", $count / $tm);
printf("Link P/S:\t\t%d links per sec\n", $t / $tm);

Output

Total Files:            8,714
Total Links:            105,109
Finished:               108.3460 sec
AvgSpeed:               0.0010 sec per file
File P/S:               80 file per sec
Link P/S:               907 links per sec

Class Used

Sink

class Sink extends Stackable {
    public function run() {
    }
}

LinkParser

class LinkParser extends Thread {

    public function __construct($file, $sink) {
        $this->file = $file;
        $this->sink = $sink;
        $this->start();
    }

    public function run() {
        $dom = new DOMDocument();
        @$dom->loadHTML(file_get_contents($this->file));
        foreach($dom->getElementsByTagName('a') as $links) {
            $this->sink[] = $links->getAttribute('href');
        }
    }
}

Experiment

Trying parsing 8,714 files that have 105,109 links without threads and see how long it would take.

Better Architecture

Spawning too many threads which is not a clever thing to do In production. A better approch would be to use Pooling. Have a pool of define Workers then stack with a Task

Performance Improvement

Fine, the example above can still be improved. Instead of waiting for the system to scan all files in a single thread you can use multiple threads to scan my system for files then stack the data to Workers for processing

3: Search index updating

This has been pretty much answered by the first answer, but there are so many ways for performance improvement. Have you ever considered an Event based approach?

Introducing Event

@rdlowrey Quote 1:

Well think of it like this. Imagine you need to serve 10,000 simultaneously connected clients in your web application. Traditional thread-per-request or process-per-request servers aren't an option because no matter how lightweight your threads are you still can't hold 10,000 of them open at a time.

@rdlowrey Quote 2:

On the other hand, if you keep all the sockets in a single process and listen for those sockets to become readable or writable you can put your entire server inside a single event loop and operate on each socket only when there's something to read/write.

Why don't you experiment with event-driven, non-blocking I/O approach to your problem. PHP has libevent to supercharge your application.

I know this question is all Multi-Threading but if you have some time you can look this Nuclear Reactor written in PHP by @igorw

Finally

Consideration

I think you should consider using Cache and Job Queue for some of your tasks. You can easily have a message saying

Document uploaded for processing ..... 5% - Done   

Then do all the time wasting tasks in the background. Please look at Making a large processing job smaller for a similar case study.

Profiling

Profiling Tool? There is no single profile tool for a web application from Xdebug to Yslow are all very useful. Eg. Xdebug is not useful when it comes to threads because its not supported

I don't have a favorite

や莫失莫忘 2024-08-28 19:13:33

PHP 并不完全面向多线程:正如您已经注意到的,每个页面都由一个 PHP 进程提供服务——它一次只做一件事,包括在数据库服务器上执行 SQL 查询时只是“等待”。

不幸的是,对此您无能为力:这就是 PHP 的工作方式。

不过,这里有一些想法:

  • 首先,您的服务器上一次可能有超过 1 个用户,这意味着您将同时提供多个页面,这反过来又意味着您将有多个 PHP 进程和 SQL 查询同时运行...这意味着将使用服务器的多个核心。
    • 每个 PHP 进程都会运行在一个核心上,响应一个用户的请求,但是 Apache 有多个子进程并行运行(每个请求一个,最多几十个或几百个) ,取决于您的配置)
    • MySQL 服务器是多线程的,这意味着它可以使用多个不同的核心来响应多个并发请求 - 即使每个请求不能由多个核心来服务。

所以,事实上,您的服务器的 8 个核心最终将被使用;-)

而且,如果您认为生成页面的时间太长,一个可能的解决方案是将计算分为两组:

  • 一方面,生成页面必须完成的事情:对于这些,您不需要做太多事情 事情
  • 另一方面,有时必须运行但不一定立即运行的
    • 例如,我正在考虑一些统计计算:您希望它们是最新的,但如果它们落后几分钟,通常也没什么问题。
    • 电子邮件发送也是如此:无论如何,您的用户会在几分钟后收到/阅读他们的邮件,因此无需立即发送邮件。

对于我的第二点中的那种情况,因为你不需要立即完成这些事情......好吧,只是不要立即做;-)

我经常使用的一个解决方案是某种排队机制:

  • Web应用程序将内容存储在“todo-list”中
  • ,并且“todo-list”被一些通过cronjob频繁运行的批次取消排队

,对于某些批次其他操作,您只希望它们每 X 分钟运行一次 - 而且,在这里,cronjob 也是完美的工具。

PHP is not quite oriented towards multi-threading : as you already noticed, each page is served by one PHP process -- that does one thing at a time, including just "waiting" while an SQL query is executed on the database server.

There is not much you can do about that, unfortunately : it's the way PHP works.

Still, here's a couple of thoughts :

  • First of all, you'll probably have more that 1 user at a time on your server, which means you'll serve several pages at the same time, which, in turn, means you'll have several PHP processes and SQL queries running at the same time... which means several cores of your server will be used.
    • Each PHP process will run on one core, in response to the request of one user, but there are several sub-processes of Apache running in parallel (one for each request, up to a couple of dozens or hundreds, depending on your configuration)
    • The MySQL server is multi-threaded, which means it can use several distinct cores to answer several concurrent requests -- even if each request cannot be served by more that one core.

So, in fact, your server's 8 core will end up being used ;-)

And, if you think your pages are taking too long to generate, a possible solution is to separate your calculations in two groups :

  • On one hand, the things that have to be done to generate the page : for those, there is not much you can do
  • On the other hand, the things that have to be run sometimes, but not necessarily immediately
    • For instance, I am think about some statistics calculations : you want them to be quite up to date, but if they lag a couple of minutes behind, that's generally quite OK.
    • Same for e-mail sending : anyway, several minutes will pass before your users receive/read their mail, so there is no need to send them immediately.

For the kind of situations in my second point, as you don't need those things done immediately... Well, just don't do them immediately ;-)

A solution that I often use is some queuing mechanism :

  • The web application store things in a "todo-list"
  • And that "todo-list" is de-queued by some batches that are run frequently via a cronjob

And for some other manipulations, you just want them run every X minutes -- and, here too, a cronjob is the perfect tool.

裂开嘴轻声笑有多痛 2024-08-28 19:13:33

在访问多核 CPU 方面,横向扩展 Web 服务器并不会让 MySQL 做出一寸的让步。为什么?首先考虑MySQL的两个主要存储引擎

MyISAM

该存储引擎不访问多个核心。它从来没有,也永远不会。它对每个 INSERT、UPDATE 和 DELETE 执行全表锁定。从多个 Web 服务器发送查询来使用 MyISAM 执行任何操作都会遇到瓶颈。

InnoDB

在 MySQL 5.1.38 之前,该存储引擎仅访问了一个CPU。你必须做一些奇怪的事情,比如在一台机器上多次运行MySQL强制核心处理 MySQL 的不同实例。然后,在多个实例之间平衡 Web 服务器的数据库连接负载。这是老派的做法(特别是如果您使用的是 MySQL 5.1.38 之前的 MySQL 版本)。

从 MySQL 5.1.38 开始,安装新的 InnoDB 插件。为了让 InnoDB 访问多个 CPU,您必须调整它的一些功能。我在 DBA StackExchange 2011 年 9 月 20 日 中写过相关内容

这些新功能完全也可在 MySQL 5.5/5.6 和 Percona Server 中使用。

注意:

如果您的自定义 CMS 使用全文索引/搜索,您应该升级到 MySQL 5.6,因为 InnoDB 现在支持全文索引/搜索。

安装 MySQL 5.6 不会自动让 CPU 运行。您必须对其进行调整,因为如果未配置,旧版本的 MySQL 可能会超越新版本:

Scaling out Web Servers is not going to make MySQL budge one inch when it comes to accessing Multicore CPUs. Why? First consider the two main Storage Engines of MySQL

MyISAM

This storage engine does not access multiple cores. It never has and never will. It does full table locking for each INSERT, UPDATE, and DELETE. Sending queries from multiple Web Servers to do anything with a MyISAM just gets bottlenecked.

InnoDB

Prior to MySQL 5.1.38, this storage engine has accessed only one CPU. You had to do strange things like run MySQL multiple times on one machine to coerce the cores to handle different instances of MySQL. Then, have the Web Servers' DB connections load balanced among the multiple instances. That's old school (especially if you are using versions of MySQL before MySQl 5.1.38).

Starting with MySQL 5.1.38, you install the new InnoDB Plugin. It has features that you have to tune for getting InnoDB to access multiple CPUs. I have written about this in the DBA StackExchange

Those new features are fully available in MySQL 5.5/5.6 and Percona Server as well.

CAVEAT

If your custom CMS uses FULLTEXT indexing/searching, you should upgrade to MySQL 5.6 because InnoDB now supports FULLTEXT indexing/searching.

Installing to MySQL 5.6 is not going to automatically make the CPUs get going. You will have to tune it because, LEFT UNCONFIGURED, it is possible for older versions of MySQL to outrun and outgun newer versions:

江心雾 2024-08-28 19:13:33

这可能不是您正在寻找的问题的答案,但您寻求的解决方案涉及线程。线程对于多核编程是必要的,而线程在 PHP 中没有实现。

但是,从某种意义上说,您可以依靠操作系统的多任务处理能力来伪造 PHP 中的线程。我建议快速概述 多线程策略使用 PHP 制定策略来实现您的需求。

死链接:
PHP 中的多线程策略

This might not be an answer to the question you are looking for, but the solution you seek deals with threading. Threading is necessary for multicore-programming, and threading is not implemented in PHP.

But, in a sense, you could fake threading in PHP by relying on the operating system's multitasking abilities. I suggest given a quick overview of Multi-threading strategies in PHP to develop a strategy to achieve what you need.

Dead link:
Multi-threading strategies in PHP

粉红×色少女 2024-08-28 19:13:33

只是让你们知道当您想到:“可怜的 PHP 没有多线程”

好吧... Python 也没有真正的多线程NodeJS 也不支持多线程。 Java 具有某种多线程功能,但即使如此,一些代码据我所知会停止整个机器

但是:除非你对一件事进行大量编程,否则它是无关紧要的。许多请求都会到达您的页面,并且您的所有核心都将被使用,因为每个请求都会使用自己的单线程生成自己的进程。

Just letting you guys know when you think: "poor PHP does not have multithreading"

Well... Python doesn't have real multithreading either. Nor does NodeJS have multi-threading support. Java has some sort of multithreading, but even there, some code halts the whole machine afaik.

But: unless you do heavy programming of one single thing, it's irrelevant. Many requests hit your page and all your cores will be used none the less as each request spawns its own process with its own single thread.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文