无法使用 PHP 以正确的编码显示下载的网页

发布于 2024-08-12 18:15:15 字数 2006 浏览 1 评论 0原文

我必须获取波斯语页面的内容并向某些用户显示该页面的一部分。问题是在过滤页面内容后,我无法使用正确的编码显示内容。该网页位于 sena.ir,这是我要显示的原始网页部分的屏幕截图:

替代文本http://img502.imageshack.us/img502/983/original.gif

这是我得到的:

alt text http://www.freeimagehosting.net/uploads/812cebe6b3.gif

这是我用来获取页面内容的函数:

function getPage($url, $referer="", $timeout="", $header=""){
    if(!isset($timeout))
        $timeout=30;
    $curl = curl_init();
    if(strstr($referer,"://")){
        curl_setopt ($curl, CURLOPT_REFERER, $referer);
    }

    $headers [] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
    $headers [] = 'Connection: Keep-Alive';
    $headers [] = 'Content-type: application/x-www-form-urlencoded;charset=utf-8 '; // I Tried iso-..... as well but no chance
    $user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
    $compression = "gzip";

    curl_setopt ($curl, CURLOPT_HTTPHEADER, $headers );
    curl_setopt ($curl, CURLOPT_HEADER, 0 );
    curl_setopt ($curl, CURLOPT_USERAGENT, $user_agent );
    curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt ($curl, CURLOPT_FOLLOWLOCATION, 1 );
    curl_setopt ($curl, CURLOPT_POST, 0 );
    curl_setopt ($curl, CURLOPT_ENCODING, $compression );
    curl_setopt ($curl, CURLOPT_TIMEOUT, 300 );
    curl_setopt ($curl, CURLOPT_SSL_VERIFYHOST, 0 );
    curl_setopt ($curl, CURLOPT_SSL_VERIFYPEER, 0 );

    curl_setopt ($curl, CURLOPT_URL, $url);
    $html = curl_exec ($curl);
    curl_close ($curl);
    return $html;
}

$content = getPage("http://sena.ir/");
$p1 = strpos($content,'<TABLE cellSpacing="3" cellPadding="3" width="100%" border="0">');
$p2 = strpos($content,"</TABLE>",$p1);
$content = substr($content, $p1, $p2-$p1);
echo $content;

I have to get the content of a persian page and show a part of that page to some users. The problem is after I filter the page content I cannot show the content with the proper encoding. The webpage is located at sena.ir and here is the screen shot of the original webpage part I want to show:

alt text http://img502.imageshack.us/img502/983/original.gif

And here is what I got:

alt text http://www.freeimagehosting.net/uploads/812cebe6b3.gif

Here is the function I use to get the page content:

function getPage($url, $referer="", $timeout="", $header=""){
    if(!isset($timeout))
        $timeout=30;
    $curl = curl_init();
    if(strstr($referer,"://")){
        curl_setopt ($curl, CURLOPT_REFERER, $referer);
    }

    $headers [] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
    $headers [] = 'Connection: Keep-Alive';
    $headers [] = 'Content-type: application/x-www-form-urlencoded;charset=utf-8 '; // I Tried iso-..... as well but no chance
    $user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
    $compression = "gzip";

    curl_setopt ($curl, CURLOPT_HTTPHEADER, $headers );
    curl_setopt ($curl, CURLOPT_HEADER, 0 );
    curl_setopt ($curl, CURLOPT_USERAGENT, $user_agent );
    curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt ($curl, CURLOPT_FOLLOWLOCATION, 1 );
    curl_setopt ($curl, CURLOPT_POST, 0 );
    curl_setopt ($curl, CURLOPT_ENCODING, $compression );
    curl_setopt ($curl, CURLOPT_TIMEOUT, 300 );
    curl_setopt ($curl, CURLOPT_SSL_VERIFYHOST, 0 );
    curl_setopt ($curl, CURLOPT_SSL_VERIFYPEER, 0 );

    curl_setopt ($curl, CURLOPT_URL, $url);
    $html = curl_exec ($curl);
    curl_close ($curl);
    return $html;
}

$content = getPage("http://sena.ir/");
$p1 = strpos($content,'<TABLE cellSpacing="3" cellPadding="3" width="100%" border="0">');
$p2 = strpos($content,"</TABLE>",$p1);
$content = substr($content, $p1, $p2-$p1);
echo $content;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

风轻花落早 2024-08-19 18:15:15

数据不是问题。
输出是问题所在。
由于类似代理的函数会删除 html 的标头和编码声明,因此您必须在输出过滤后的数据之前添加这些行:

<html lang="fa"> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 

Data was not the problem.
The output was the problem.
Since the proxy like function removes the headers of the html and encoding declerations you have to add these lines before you output the filtered data:

<html lang="fa"> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文