php 俄语语言问题

发布于 2024-09-26 00:12:48 字数 936 浏览 9 评论 0原文

我使用curl 获取俄语语言的utf-8 页面。如果我回显文本,它会显示良好。然后我使用这样的代码

$dom = new domDocument; 

        /*** load the html into the object ***/ 
        @$dom->loadHTML($html); 

        /*** discard white space ***/ 
        $dom->preserveWhiteSpace = false; 

        /*** the table by its tag name ***/ 
        $tables = $dom->getElementsByTagName('table'); 

        /*** get all rows from the table ***/ 
        $rows = $tables->item(0)->getElementsByTagName('tr'); 

        /*** loop over the table rows ***/ 
        for ($i = 0; $i <= 5; $i++)
        { 
            /*** get each column by tag name ***/ 
            $cols = $rows->item($i)->getElementsByTagName('td'); 

            echo $cols->item(2)->nodeValue; 

            echo '<hr />'; 
        } 

$html 包含俄语文本。在该行之后 echo $cols->item(2)->nodeValue; 显示错误文本,而不是俄语。我尝试 iconv 但不起作用。有什么想法吗?

i get page in utf-8 with russian language using curl. if i echo text it show good. then i use such code

$dom = new domDocument; 

        /*** load the html into the object ***/ 
        @$dom->loadHTML($html); 

        /*** discard white space ***/ 
        $dom->preserveWhiteSpace = false; 

        /*** the table by its tag name ***/ 
        $tables = $dom->getElementsByTagName('table'); 

        /*** get all rows from the table ***/ 
        $rows = $tables->item(0)->getElementsByTagName('tr'); 

        /*** loop over the table rows ***/ 
        for ($i = 0; $i <= 5; $i++)
        { 
            /*** get each column by tag name ***/ 
            $cols = $rows->item($i)->getElementsByTagName('td'); 

            echo $cols->item(2)->nodeValue; 

            echo '<hr />'; 
        } 

$html contains russian text. after it line echo $cols->item(2)->nodeValue; display error text, not russian. i try iconv but not work. any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

别挽留 2024-10-03 00:12:48

我建议在加载 UTF-8 页面之前使用 mb_convert_encoding

    $dom = new DomDocument();   
    $html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
    @$dom->loadHTML($html);

或者你可以尝试这个

    $dom = new DomDocument('1.0', 'UTF-8');
    @$dom->loadHTML($html);
    $dom->preserveWhiteSpace = false;
    ..........
    echo html_entity_decode($cols->item(2)->nodeValue,ENT_QUOTES,"UTF-8");
    .......... 

I suggest use mb_convert_encoding before load UTF-8 page.

    $dom = new DomDocument();   
    $html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
    @$dom->loadHTML($html);

OR else you could try this

    $dom = new DomDocument('1.0', 'UTF-8');
    @$dom->loadHTML($html);
    $dom->preserveWhiteSpace = false;
    ..........
    echo html_entity_decode($cols->item(2)->nodeValue,ENT_QUOTES,"UTF-8");
    .......... 
深海夜未眠 2024-10-03 00:12:48

DOM 无法识别 HTML 的编码。
你可以尝试这样的事情:

$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="UTF-8">' . $html);

// taken from http://php.net/manual/en/domdocument.loadhtml.php#95251

The DOM cannot recognize the HTML's encoding.
You can try something like:

$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="UTF-8">' . $html);

// taken from http://php.net/manual/en/domdocument.loadhtml.php#95251
兔姬 2024-10-03 00:12:48

mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");

同样的事情也适用于 PHPQuery。

PS我使用 phpQuery::newDocument($html);

而不是 $dom->loadHTML($html);

mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");

The same thing worked for PHPQuery.

P.S. I use phpQuery::newDocument($html);

instead of $dom->loadHTML($html);

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文