使用 PHP Curl 库持久/保持 HTTP?
我正在使用一个简单的 PHP 库通过 HTTP 将文档添加到 SOLR 索引。
当前涉及 3 个服务器:
- 运行索引作业的 PHP 框
- 保存正在索引的数据的数据库框
- solr 框。
在 80 个文档/秒(100 万个文档中)的情况下,我注意到 PHP 和 solr 盒子上的网络接口的中断率异常高(2000 个/秒;更重要的是,图表几乎相同 - 当中断PHP 框上的速率达到峰值,Solr 框上的速率也达到峰值),但数据库框上的速率要低得多(300/秒)。 我想这只是因为我打开并重用了与数据库服务器的单个连接,但由于 Solr 客户端库的编写方式,每个 Solr 请求当前都通过 cURL 打开一个新的 HTTP 连接。
所以,我的问题是:
- 可以使用 cURL 打开 keepalive 会话吗?
- 重用连接需要什么? -- 就像重用 cURL 句柄资源一样简单吗?
- 我需要设置任何特殊的 cURL 选项吗? (例如强制 HTTP 1.1?)
- cURL keepalive 连接是否有任何问题? 该脚本一次运行几个小时; 我能够使用单个连接,还是需要定期重新连接?
I'm using a simple PHP library to add documents to a SOLR index, via HTTP.
There are 3 servers involved, currently:
- The PHP box running the indexing job
- A database box holding the data being indexed
- The solr box.
At 80 documents/sec (out of 1 million docs), I'm noticing an unusually high interrupt rate on the network interfaces on the PHP and solr boxes (2000/sec; what's more, the graphs are nearly identical -- when the interrupt rate on the PHP box spikes, it also spikes on the Solr box), but much less so on the database box (300/sec). I imagine this is simply because I open and reuse a single connection to the database server, but every single Solr request is currently opening a new HTTP connection via cURL, thanks to the way the Solr client library is written.
So, my question is:
- Can cURL be made to open a keepalive session?
- What does it take to reuse a connection? -- is it as simple as reusing the cURL handle resource?
- Do I need to set any special cURL options? (e.g. force HTTP 1.1?)
- Are there any gotchas with cURL keepalive connections? This script runs for hours at a time; will I be able to use a single connection, or will I need to periodically reconnect?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
cURL PHP 文档 (curl_setopt) 说:
所以:
cURL PHP documentation (curl_setopt) says:
So:
默认情况下,Curl 发送 keep-alive 标头,但是:
curl_init()
创建上下文。CURLOPT_URL
选项将 url 传递到上下文curl_exec()
执行请求curl_close()
关闭连接非常基本的示例:
Curl sends the keep-alive header by default, but:
curl_init()
without any parameters.CURLOPT_URL
option to pass the url to the contextcurl_exec()
curl_close()
very basic example:
在您访问的服务器上,必须启用保持活动状态,并且最大保持活动请求数应该合理。 对于 Apache,请参阅 apache 文档。
您必须重新使用相同的 cURL 上下文。
配置 cURL 上下文时,在标头中启用保持活动状态并超时:
On the server you are accessing keep-alive must be enabled and maximum keep-alive requests should be reasonable. In the case of Apache, refer to the apache docs.
You have to be re-using the same cURL context.
When configuring the cURL context, enable keep-alive with timeout in the header:
如果您不关心请求的响应,则可以异步执行它们,但您会面临 SOLR 索引超载的风险。 但我对此表示怀疑,SOLR 的速度相当快。
异步 PHP 调用?
If you don't care about the response from the request, you can do them asynchronously, but you run the risk of overloading your SOLR index. I doubt it though, SOLR is pretty damn quick.
Asynchronous PHP calls?