在 PHP 中获取页面标题

发布于 2024-08-29 05:38:22 字数 413 浏览 3 评论 0原文

当我想获取远程网站的标题时,我使用这个脚本:

function get_remotetitle($urlpage) {
    $file = @fopen(($urlpage),"r");
    $text = fread($file,16384);
    if (preg_match('/<title>(.*?)<\/title>/is',$text,$found)) {
        $title = $found[1];
    } else {
        $title = 'Title N/A';
    }
    return $title;
}


但是当我用重音符号解析网站标题时,我得到“�”。但如果我查看 PHPMyAdmin,我会正确地看到重音符号。发生什么事了?

When I want to get the title of a remote webiste, I use this script:

function get_remotetitle($urlpage) {
    $file = @fopen(($urlpage),"r");
    $text = fread($file,16384);
    if (preg_match('/<title>(.*?)<\/title>/is',$text,$found)) {
        $title = $found[1];
    } else {
        $title = 'Title N/A';
    }
    return $title;
}

But when I parase a webiste title with accents, I get "�". But if I look in PHPMyAdmin, I see the accents correctly. What's happening?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

贩梦商人 2024-09-05 05:38:23

我解决了。我添加了 htmlentities($text) 现在显示重音符号等等。

I solved it. I added htmlentities($text) and now displays the accents and so.

葵雨 2024-09-05 05:38:23

试试这个:

echo iconv('UTF-8', 'ASCII//TRANSLIT', $title);

Try this:

echo iconv('UTF-8', 'ASCII//TRANSLIT', $title);
飞烟轻若梦 2024-09-05 05:38:22

这很可能是字符编码问题。您可能正确获取了字符,但显示该字符的页面的字符编码错误,因此无法正确显示。

This is most likely a character encoding issue. You are probably getting the character correctly but the page that displays it has the wrong character encoding so it doesn't display right.

不气馁 2024-09-05 05:38:22

查看 PHP 简单 HTML DOM 解析器

使用类似以下内容:

$html = file_get_html('http://www.google.com/');
$ret = $html->find('title', 0);

check out PHP Simple HTML DOM Parser

use it something like:

$html = file_get_html('http://www.google.com/');
$ret = $html->find('title', 0);
抱猫软卧 2024-09-05 05:38:22

问题在于文本的编码与您在显示文本的页面上使用的编码不同。

您想要做的是找出数据的编码方式(例如,查看从中获取文本的页面正在使用的编码方式)并将其转换为您自己使用的编码方式。

要进行实际转换,您可以使用 iconv (对于一般情况case), utf8_decode (UTF8 -> ISO-8859 -1), utf8_encode (ISO-8859-1 - > UTF8) 或 mb_convert_encoding

为了帮助您找出源页面的编码是什么,您可以例如将网站通过 w3c 验证器 自动检测编码。

如果想要一种自动方式来确定编码,则必须查看 HTML 本身。确定所选字符集的方法可以在 HTML 4 规范

此外,值得一看每个软件开发人员绝对必须了解 Unicode 的绝对最低限度和字符集(没有借口!) 了解有关编码的更多信息。

The trouble is that the text has a different encoding from what you're using on the page you're displaying it on.

What you want to do is find out what encoding the data is (for instance by looking at what encoding the page you take the text from is using) and converting it to the encoding you're using yourself.

For doing the actual conversion, you can use iconv (for the general case), utf8_decode (UTF8 -> ISO-8859-1), utf8_encode (ISO-8859-1 -> UTF8) or mb_convert_encoding.

To help you find out what the encoding of the source page is, you could for instance put the website through the w3c Validator which automatically detects encoding.

If want an automatic way to determine encoding, you'll have to look at the HTML itself. The ways you can determine the selected charset can be fonud in the HTML 4 specification.

In addition, it's worth having a look at The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) for a bit more information on encoding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文