密码网站上的 PHP 卷曲多
我目前正在使用以下(旧)代码登录网站...
public function login() {
$url1 = 'https://...'; /* Initial page load to collect initial session cookie data */
$url2 = 'https://...'; /* The page to POST login data to */
$url3 = 'https://...'; /* The page redirected to to test for success */
$un = 'user';
$pw = 'pass';
$post_data = array(
'authmethod' => 'on',
'username' => $un,
'password' => $pw,
'hrpwd' => $pw
);
$curlOpt1 = array(
CURLOPT_URL => $url1,
CURLOPT_COOKIEJAR => self::COOKIEFILE,
CURLOPT_COOKIEFILE => self::COOKIEFILE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_HEADER => FALSE,
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_SSL_VERIFYPEER => FALSE
);
$curlOpt2 = array(
CURLOPT_URL => $url2,
CURLOPT_COOKIEJAR => self::COOKIEFILE,
CURLOPT_COOKIEFILE => self::COOKIEFILE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_POST => TRUE,
CURLOPT_POSTFIELDS => http_build_query($post_data)
);
$this->ch = curl_init();
if ( !$this->ch ) {
throw new Exception('Unable to init curl. ' . curl_error($curl));
}
/* Load the login page once to get the session ID cookies */
curl_setopt_array( $this->ch, $curlOpt1 );
if ( !curl_exec( $this->ch ) ) {
throw new Exception( 'Unable to retrieve initial auth cookie.' );
}
/* POST the login data to the login page */
curl_setopt_array($this->ch, $curlOpt2);
if ( !curl_exec( $this->ch ) ) {
throw new Exception( 'Unable to post login data.' );
}
/* Verify the login by checking the redirected url. */
$header = curl_getinfo( $this->ch );
$retUrl = $header['url'];
if ( $retUrl == $url3 ) {
/* Reload the login page to get the auth cookies */
curl_setopt_array( $this->ch, $curlOpt1 );
if ( curl_exec( $this->ch ) ) {
return true;
} else {
throw new Exception( 'Unable to retrieve final auth cookie.' );
}
} else {
throw new Exception( 'Login validation failure.' );
}
return false;
}
然后我使用...
public function getHtml($url) {
$html = FALSE;
try {
curl_setopt($this->ch, CURLOPT_URL, $url);
$page = curl_exec($this->ch);
} catch (Exception $e) {
...
}
/* Remove all tabs and newlines from the HTML */
$rmv = array("\n","\t");
$html = str_replace($rmv, '', $page);
return $html;
}
...每个页面请求。我的问题是,如何将其转换为使用curl_multi_exec 来更快地进行数百次查找?我找不到任何带有登录的curl_multi 示例。我是否只需将所有curl_execs替换为curl_multi_exec? 另外,如果您发现任何其他明显的错误,欢迎评论。
需要明确的是,我想使用单个用户/密码登录,然后将这些凭据重新用于多个页面请求。
I'm currently using the following (old) code to log into a site...
public function login() {
$url1 = 'https://...'; /* Initial page load to collect initial session cookie data */
$url2 = 'https://...'; /* The page to POST login data to */
$url3 = 'https://...'; /* The page redirected to to test for success */
$un = 'user';
$pw = 'pass';
$post_data = array(
'authmethod' => 'on',
'username' => $un,
'password' => $pw,
'hrpwd' => $pw
);
$curlOpt1 = array(
CURLOPT_URL => $url1,
CURLOPT_COOKIEJAR => self::COOKIEFILE,
CURLOPT_COOKIEFILE => self::COOKIEFILE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_HEADER => FALSE,
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_SSL_VERIFYPEER => FALSE
);
$curlOpt2 = array(
CURLOPT_URL => $url2,
CURLOPT_COOKIEJAR => self::COOKIEFILE,
CURLOPT_COOKIEFILE => self::COOKIEFILE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_POST => TRUE,
CURLOPT_POSTFIELDS => http_build_query($post_data)
);
$this->ch = curl_init();
if ( !$this->ch ) {
throw new Exception('Unable to init curl. ' . curl_error($curl));
}
/* Load the login page once to get the session ID cookies */
curl_setopt_array( $this->ch, $curlOpt1 );
if ( !curl_exec( $this->ch ) ) {
throw new Exception( 'Unable to retrieve initial auth cookie.' );
}
/* POST the login data to the login page */
curl_setopt_array($this->ch, $curlOpt2);
if ( !curl_exec( $this->ch ) ) {
throw new Exception( 'Unable to post login data.' );
}
/* Verify the login by checking the redirected url. */
$header = curl_getinfo( $this->ch );
$retUrl = $header['url'];
if ( $retUrl == $url3 ) {
/* Reload the login page to get the auth cookies */
curl_setopt_array( $this->ch, $curlOpt1 );
if ( curl_exec( $this->ch ) ) {
return true;
} else {
throw new Exception( 'Unable to retrieve final auth cookie.' );
}
} else {
throw new Exception( 'Login validation failure.' );
}
return false;
}
I then use...
public function getHtml($url) {
$html = FALSE;
try {
curl_setopt($this->ch, CURLOPT_URL, $url);
$page = curl_exec($this->ch);
} catch (Exception $e) {
...
}
/* Remove all tabs and newlines from the HTML */
$rmv = array("\n","\t");
$html = str_replace($rmv, '', $page);
return $html;
}
...for each page request. My question is, how can i convert this to use curl_multi_exec to make several hundred look-ups quicker? I can't find any examples of curl_multi WITH login. Do I simply replace all curl_execs with curl_multi_exec?
Also, if you see any other glaringly obvious mistakes, comments are surely welcome.
To be clear, I would like to log in with a single user/pass then reuse those credentials for multiple page requests.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
已经有一段时间了,但我想发布我的最终解决方案。我发现了一个很棒的多卷曲库 rolling-curl,它很有帮助。基本上在收集登录cookie(如我原来的问题所示)之后,我将其和其他选项返回到每个多重请求的滚动curl实例中,然后执行批处理。就像魅力一样。
请注意,此解决方案需要回调来处理每个请求的返回。 RollingCurl 的文档对此进行了很好的描述,因此我在此不再重复。
It's been awhile but I wanted to post my final solution. I found an awesome multi-curl library, rolling-curl, that helped. Basically after collecting the login cookie (shown in my original question) I feed it, and other options, back into the rolling curl instance for each multi request, then execute the batch. Works like a charm.
Note that this solution requires a callback to handle the return of each request. RollingCurl's documentation describes this well so I won't reiterate it here.