CodeIgniter:帮助从网页获取元标记的类/库?

发布于 2024-08-21 22:18:00 字数 148 浏览 4 评论 0 原文

我正在使用代码点火器。我想我使用哪个 php 框架并不重要。

但在我编写自己的类之前,已经编写了另一个类,该类允许用户获取任何站点的页面标题和元标记(关键字、描述)...如果有的话。

任何能够做到这一点的 PHP 类都很棒。

谢谢大家

I am using codeigniter. I guess it doesn't matter which php framework I am using.

But before I write my own class is there another that has already been written that allows a user to get the page title and meta tags (keywords, descriptions) of any sit...if they have any.

Any sort of PHP class that does that would be great.

Thanks all

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

想念有你 2024-08-28 22:18:00

你应该看看这个类: PHP Simple HTML DOM 它的工作方式如下:

include('simple_html_dom.php');
$html = file_get_html('http://www.codeigniter.com/');

echo $html->find('title', 0)->innertext; // get <title>

echo "<pre>";
foreach($html->find('meta') as $element)
       echo $element->name . " : " . $element->content  . '<br>'; //prints every META tag

echo "</pre>";

You should have a look at this class: PHP Simple HTML DOM it works this way:

include('simple_html_dom.php');
$html = file_get_html('http://www.codeigniter.com/');

echo $html->find('title', 0)->innertext; // get <title>

echo "<pre>";
foreach($html->find('meta') as $element)
       echo $element->name . " : " . $element->content  . '<br>'; //prints every META tag

echo "</pre>";
遇见了你 2024-08-28 22:18:00

使用PHP 的curl 库。它可以从网络上提取其他页面并将它们作为字符串获取,然后您可以使用正则表达式解析该字符串以查找页面的标题和元标记。

Use PHP's curl library. It can pull other pages from the web and fetch them as strings, and then you can parse the string with regular expressions to find the page's title and meta tags.

熊抱啵儿 2024-08-28 22:18:00

您可以使用 get_meta_tags 从远程页面获取所有元标记 - https://www.php.net/get_meta_tags

此页面有一个类来获取页面和描述,他们也使用 get_meta_tags - http://www.emirplicanic.com/php/get-remote-page-title-with-php.php

您应该能够将两者的位组合起来以获得您想要的一切需要。

You can get all the meta tags froma remote page with get_meta_tags - https://www.php.net/get_meta_tags

this page has a class to get the page and description, they are also using get_meta_tags - http://www.emirplicanic.com/php/get-remote-page-title-with-php.php

You should be able to combine bits from both to get everything you need.

守护在此方 2024-08-28 22:18:00

使用 DOM/xpath

libxml_use_internal_errors(true);
$c = file_get_contents("http://url/here");
$d = new DomDocument();
$d->loadHTML($c);
$xp = new domxpath($d);
foreach ($xp->query("//meta[@name='keywords']") as $el) {
    echo $el->getAttribute("content");
}
foreach ($xp->query("//meta[@name='description']") as $el) {
    echo $el->getAttribute("content");
}

With DOM/xpath

libxml_use_internal_errors(true);
$c = file_get_contents("http://url/here");
$d = new DomDocument();
$d->loadHTML($c);
$xp = new domxpath($d);
foreach ($xp->query("//meta[@name='keywords']") as $el) {
    echo $el->getAttribute("content");
}
foreach ($xp->query("//meta[@name='description']") as $el) {
    echo $el->getAttribute("content");
}
美人骨 2024-08-28 22:18:00

请参阅此内容。 这是获取页面元标记并执行操作的通用类还有更多。看看是否可以将其添加到 codeigniter 库中。谢谢

See this please. This is generic class to get page meta tags and do a lot more. See if you can add this in codeigniter library. Thanks

原来是傀儡 2024-08-28 22:18:00

试试这个:

    libxml_use_internal_errors(true);
    $urlDecoded = $this->input->post('url');
    $c = file_get_contents($urlDecoded);
    $d = new DomDocument();
    $d->loadHTML($c);
    
    $metaTags = [
        'title' => '',
        'description' => '',
        'image' => '',
        'canonical' => '',
        'url' => '',
        'author' => '',
        'availability' => '',
        'keywords' => '',
        'og:description' => '',
        'og:determiner' => '',
        'og:image' => '',
        'og:image:height' => '',
        'og:image:secure_url' => '',
        'og:image:type' => '',
        'og:image:width' => '',
        'og:locale' => '',
        'og:locale:alternate' => '',
        'og:site_name' => '',
        'og:title' => '',
        'og:type' => '',
        'og:url' => '',
        'price' => '',
        'priceCurrency' => '',
        'source' => '',
    ];

    foreach ($d->getElementsByTagName('meta') as $meta) {
        $property = $meta->getAttribute('property');
        $content = $meta->getAttribute('content');
        if (strpos($property, 'og') === 0) {
            $metaTags[$property] = $content;

            if ($property === 'og:title') $metaTags['title'] = $property;
            if ($property === 'og:description') $metaTags['description'] = $property;
            if ($property === 'og:image') $metaTags['image'] = $property;
        }
    }
    $metaTags['canonical'] = $urlDecoded;
    $metaTags['url'] = $urlDecoded;

Try this:

    libxml_use_internal_errors(true);
    $urlDecoded = $this->input->post('url');
    $c = file_get_contents($urlDecoded);
    $d = new DomDocument();
    $d->loadHTML($c);
    
    $metaTags = [
        'title' => '',
        'description' => '',
        'image' => '',
        'canonical' => '',
        'url' => '',
        'author' => '',
        'availability' => '',
        'keywords' => '',
        'og:description' => '',
        'og:determiner' => '',
        'og:image' => '',
        'og:image:height' => '',
        'og:image:secure_url' => '',
        'og:image:type' => '',
        'og:image:width' => '',
        'og:locale' => '',
        'og:locale:alternate' => '',
        'og:site_name' => '',
        'og:title' => '',
        'og:type' => '',
        'og:url' => '',
        'price' => '',
        'priceCurrency' => '',
        'source' => '',
    ];

    foreach ($d->getElementsByTagName('meta') as $meta) {
        $property = $meta->getAttribute('property');
        $content = $meta->getAttribute('content');
        if (strpos($property, 'og') === 0) {
            $metaTags[$property] = $content;

            if ($property === 'og:title') $metaTags['title'] = $property;
            if ($property === 'og:description') $metaTags['description'] = $property;
            if ($property === 'og:image') $metaTags['image'] = $property;
        }
    }
    $metaTags['canonical'] = $urlDecoded;
    $metaTags['url'] = $urlDecoded;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文