感谢您查看这个问题。
背景
我有几台机器可以在很短的时间内连续生成多个(最多 300 个)PHP 控制台脚本。这些脚本运行速度很快(不到一秒),然后退出。所有这些脚本都需要对大型 trie 结构进行只读访问,加载到该结构中将非常昂贵每次运行每个脚本时的内存。服务器运行Linux。
我的解决方案
创建一个 C 守护进程,将 trie 结构保存在内存中并接收来自 PHP 客户端的请求。它将接收来自每个 PHP 客户端的请求,对内存结构执行查找并返回答案,从而使 PHP 脚本免于执行这项工作。请求和响应都是短字符串(不超过 20 个字符)
我的问题
我对 C 守护进程和进程间通信非常陌生。经过大量研究,我将选择范围缩小到消息队列和 Unix 域套接字。消息队列似乎足够了,因为我认为(我可能是错的)它们将守护进程的所有请求排队以串行回答它们。不过,Unix 域套接字似乎更容易使用。但是,我有各种问题无法找到答案:
- PHP 脚本如何发送和接收消息或使用 UNIX 套接字与守护进程进行通信?相反,C 守护进程如何跟踪它必须向哪个 PHP 进程发送回复?
- 我见过的大多数守护进程示例都使用无限 while 循环,其中包含睡眠条件。我的守护进程需要为随时可能出现的许多连接提供服务,并且响应延迟至关重要。如果 PHP 脚本在休眠时发送请求,守护进程将如何反应?我读过有关 poll 和 epoll 的内容,这是等待收到消息的正确方法吗?
- 每个 PHP 进程总是会发送一个请求,然后等待接收响应。我需要确保如果守护进程关闭/不可用,PHP 进程将在设定的最长时间内等待响应,如果没有收到响应,则将继续而不是挂起。这可以做到吗?
数据结构的实际查找非常快,我不需要任何复杂的多线程或类似的解决方案,因为我相信以 FIFO 方式处理请求就足够了。我还需要保持简单愚蠢,因为这是一项关键任务服务,而且我对此类程序相当陌生。 (我知道,但我真的没有办法解决这个问题,而且学习体验会很棒)
我真的很感激代码片段能够帮助我解决我遇到的具体问题。也欢迎提供指南和指针的链接,这些链接和指针将进一步加深我对低级 IPC 的黑暗世界的理解。
感谢您的帮助!
更新
现在比我问这个问题时知道的更多,我只是想向任何感兴趣的人指出 Thrift 框架和 ZeroMQ 在抽象出困难的套接字级编程方面做得非常出色。 Thrift 甚至免费为您提供服务器的脚手架!
事实上,您可以考虑使用已经为您解决问题的良好异步服务器来编写应用程序服务器代码,而不是进行构建网络服务器的所有艰苦工作。当然,使用异步 IO 的服务器非常适合不需要密集 CPU 处理(或者事件循环阻塞)的网络应用程序。
python 示例:Twisted、gevent。我更喜欢gevent,并且不包括tornado,因为它专注于HTTP 服务器端。
Ruby 示例: EventMachine
当然,Node.js 基本上是当今异步服务器的默认选择。
如果您想更深入地了解,请阅读 C10k Problem 和 Unix 网络编程。
and thanks for taking a look at the question.
The background
I have several machines that continuously spawn multiple (up to 300) PHP console scripts in a very short time frame. These scripts run quickly (less than a second) and then exit. All of these scripts need read only access to a large trie structure which would be very expensive to load into memory each time each one of the scripts runs. The server runs Linux.
My solution
Create a C daemon that keeps the trie structure in memory and receives requests from the PHP clients. It would receive a request from every one of the PHP clients, perform the lookup on the memory structure and respond with the answer, saving the PHP scripts from doing that work. Both requests and responses are short strings (no longer than 20 characters)
My problem
I am very new to C daemons and inter process communication. After much research, I have narrowed the choices down to Message Queues and Unix domain sockets. Message Queues seem adequate because I think (I may be wrong) that they queue up all of the requests for the daemon to answer them serially. Unix domain sockets seem to be easier to use, though. However, I have various questions I have not been able to find answers to:
- How can a PHP script send and receive messages or use a UNIX socket to communicate with the daemon? Conversely how does the C daemon keep track of which PHP process it has to send a reply to?
- Most examples of daemons I have seen use an infinite while loop with a sleep condition inside. My daemon needs to service many connections that can come at any time, and response latency is critical. How would the daemon react if the PHP script sends a request while it is sleeping? I have read about poll and epoll, would this be the correct way to wait for a received message?
- Each PHP process will always send one request, and then will wait to receive a response. I need to make sure that if the daemon is down / unavailable, the PHP process will wait for a response for a set maximum time, and if no answer is received will continue regardless instead of hanging. Can this be done?
The actual lookup of the data structure is very fast, I don't need any complex multi-threading or similar solution, as I believe handling the requests in a FIFO manner will be enough. I also need to keep it simple stupid, as this is a mission critical service, and I am fairly new to this type of program. (I know, but I really have no way around this, and the learning experience will be great)
I would really appreciate code snippets that shine some light into the specific questions that I have. Links to guides and pointers that will further my understanding into this murky world of low level IPC are also welcome.
Thanks for your help!
Update
Knowing much more now than I did at the time of asking this question, I just wanted to point out to anybody interested that both the Thrift framework and ZeroMQ do a fantastic job of abstracting away the hard, socket-level programming. Thrift even gives you the scaffolding for the server for free!
In fact, instead of going to all the hard work of building a network server, consider just writing you applications server code using a good asynchronous server that has already solved the problem for you. Of course servers that use asynchronous IO are great for network applications that don't require intensive CPU processing (or else the event loop blocks).
Examples for python: Twisted, gevent. I prefer gevent, and I don't include tornado because it is focused on the HTTP server side.
Examples for Ruby: EventMachine
Of course, Node.js is basically the default choice for an async server nowadays.
If you want to go deeper, read the C10k Problem, and Unix Network Programing.
发布评论
评论(7)
我怀疑 Thrift 就是你想要的。您必须编写一些粘合代码来执行 PHP <-thrift-> 操作。 C++ <-> C,但这可能比自己动手更强大。
I suspect Thrift is what you want. You'd have to write a little glue code to do PHP <-thrift-> C++ <-> C, but that would probably be more robust than rolling your own.
下面是一个工作示例,其中 php 脚本向 C 守护进程发送请求,然后等待响应。它在数据报模式下使用 Unix 域套接字,因此快速且简单。
客户端.php
服务器.c
Here is a working example where the php script sends a request to a C daemon and then waits for the response. It uses Unix domain sockets in datagram mode so it is fast and simple.
client.php
server.c
您还可以使用 PHP 的共享内存函数将数据结构加载到共享内存中 http:// /www.php.net/manual/en/book.shmop.php。
哦,从文档中看不出来,但协调变量是 shmop_open 中的 $key 。每个需要访问共享内存的进程都应该有相同的$key。因此,一个进程使用 $key 创建共享内存。如果其他进程使用相同的 $key,则可以访问该共享内存。我相信你可以为$key选择任何你喜欢的东西。
You could also load the data structure into shared memory using PHP's shared memory functions http://www.php.net/manual/en/book.shmop.php.
Oh, it's not obvious from the documentation but the coordinating variable is $key in shmop_open. Every process needing access to the shared memory should have the same $key. So, one process creates the shared memory with $key. The other processes then can access that shared memory if they use the same $key. I believe you can choose whatever you like for $key.
nanomsg 是用纯 C 编码的,所以我想它比用 C++ 编码的 Thrift 和 ZeroMQ 更适合您的需求。
它有许多语言(包括 PHP)的包装器。
这是使用
NN_PAIR
协议的工作示例:(您也可以使用NN_REQREP
)client.php
server.c
nanomsg is coded in plain C so I guess it is better suited for your needs than Thrift and ZeroMQ that are coded in C++.
It has wrappers for many languages including PHP.
Here is a working example using the
NN_PAIR
protocol: (you can useNN_REQREP
too)client.php
server.c
“问题”(也许不是?)是 SysV MQ 上肯定有很多消费者/生产者。虽然如果您不一定对生产者:消费者到资源模型有 m:n 需求,那么您正在做的事情是完全可能的,但您在这里有一个请求/响应模型。
使用 SysV MQ 时,您可能会遇到一些奇怪的问题。
首先,您确定 INET 套接字对您来说不够快吗?使用 unix 域套接字的快速 PHP 示例位于 http://us.php.net/socket- create-pair (当然,就像代码示例一样,使用 socket_create() 作为 PHP 端点)。
The "problem" (maybe not?) is that there can certainly be many consumers/producers on the SysV MQs. Though perfectly possible for what you're doing if you don't necessarily have an m:n need on the producer:consumer to resources model, you have a request/response model here.
You can get some strange hangups with SysV MQ as it is.
First, are you sure that INET sockets aren't fast enough for you? A quick PHP example using unix domain sockets is at http://us.php.net/socket-create-pair (just as code example of course, use socket_create() for the PHP endpoint).
尽管我从未尝试过 memcached 以及适当的 PHP 扩展 应该消除大部分繁重的工作。
澄清:我隐含地假设,如果您这样做,您将使用扁平键将特里树的各个叶子放入内存缓存中,从而放弃特里树。当然,这种方法的可行性和可取性取决于许多因素,首先是数据源。
Although I have never tried it, memcached along with an appropriate PHP extension ought to eliminate most of the grunt work.
Clarification: I was implicitly assuming that if you did this, you would put the individual leaves of the trie into the memcache using flattened keys, ditching the trie. The feasibility and desirability of this approach, of course, depends on many factors, first and foremost being the data source.
使用 Pipes 可以轻松完成脚本之间的 IPC。这使得实现变得非常简单。
IPC between script can be done much easily using Pipes. Which makes a much simple implementation.