无法使用 PHP 以正确的编码显示下载的网页
我必须获取波斯语页面的内容并向某些用户显示该页面的一部分。问题是在过滤页面内容后,我无法使用正确的编码显示内容。该网页位于 sena.ir,这是我要显示的原始网页部分的屏幕截图:
替代文本http://img502.imageshack.us/img502/983/original.gif
这是我得到的:
alt text http://www.freeimagehosting.net/uploads/812cebe6b3.gif
这是我用来获取页面内容的函数:
function getPage($url, $referer="", $timeout="", $header=""){
if(!isset($timeout))
$timeout=30;
$curl = curl_init();
if(strstr($referer,"://")){
curl_setopt ($curl, CURLOPT_REFERER, $referer);
}
$headers [] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
$headers [] = 'Connection: Keep-Alive';
$headers [] = 'Content-type: application/x-www-form-urlencoded;charset=utf-8 '; // I Tried iso-..... as well but no chance
$user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
$compression = "gzip";
curl_setopt ($curl, CURLOPT_HTTPHEADER, $headers );
curl_setopt ($curl, CURLOPT_HEADER, 0 );
curl_setopt ($curl, CURLOPT_USERAGENT, $user_agent );
curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt ($curl, CURLOPT_FOLLOWLOCATION, 1 );
curl_setopt ($curl, CURLOPT_POST, 0 );
curl_setopt ($curl, CURLOPT_ENCODING, $compression );
curl_setopt ($curl, CURLOPT_TIMEOUT, 300 );
curl_setopt ($curl, CURLOPT_SSL_VERIFYHOST, 0 );
curl_setopt ($curl, CURLOPT_SSL_VERIFYPEER, 0 );
curl_setopt ($curl, CURLOPT_URL, $url);
$html = curl_exec ($curl);
curl_close ($curl);
return $html;
}
$content = getPage("http://sena.ir/");
$p1 = strpos($content,'<TABLE cellSpacing="3" cellPadding="3" width="100%" border="0">');
$p2 = strpos($content,"</TABLE>",$p1);
$content = substr($content, $p1, $p2-$p1);
echo $content;
I have to get the content of a persian page and show a part of that page to some users. The problem is after I filter the page content I cannot show the content with the proper encoding. The webpage is located at sena.ir and here is the screen shot of the original webpage part I want to show:
alt text http://img502.imageshack.us/img502/983/original.gif
And here is what I got:
alt text http://www.freeimagehosting.net/uploads/812cebe6b3.gif
Here is the function I use to get the page content:
function getPage($url, $referer="", $timeout="", $header=""){
if(!isset($timeout))
$timeout=30;
$curl = curl_init();
if(strstr($referer,"://")){
curl_setopt ($curl, CURLOPT_REFERER, $referer);
}
$headers [] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
$headers [] = 'Connection: Keep-Alive';
$headers [] = 'Content-type: application/x-www-form-urlencoded;charset=utf-8 '; // I Tried iso-..... as well but no chance
$user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
$compression = "gzip";
curl_setopt ($curl, CURLOPT_HTTPHEADER, $headers );
curl_setopt ($curl, CURLOPT_HEADER, 0 );
curl_setopt ($curl, CURLOPT_USERAGENT, $user_agent );
curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt ($curl, CURLOPT_FOLLOWLOCATION, 1 );
curl_setopt ($curl, CURLOPT_POST, 0 );
curl_setopt ($curl, CURLOPT_ENCODING, $compression );
curl_setopt ($curl, CURLOPT_TIMEOUT, 300 );
curl_setopt ($curl, CURLOPT_SSL_VERIFYHOST, 0 );
curl_setopt ($curl, CURLOPT_SSL_VERIFYPEER, 0 );
curl_setopt ($curl, CURLOPT_URL, $url);
$html = curl_exec ($curl);
curl_close ($curl);
return $html;
}
$content = getPage("http://sena.ir/");
$p1 = strpos($content,'<TABLE cellSpacing="3" cellPadding="3" width="100%" border="0">');
$p2 = strpos($content,"</TABLE>",$p1);
$content = substr($content, $p1, $p2-$p1);
echo $content;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
数据不是问题。
输出是问题所在。
由于类似代理的函数会删除 html 的标头和编码声明,因此您必须在输出过滤后的数据之前添加这些行:
Data was not the problem.
The output was the problem.
Since the proxy like function removes the headers of the html and encoding declerations you have to add these lines before you output the filtered data: