使用 php 正则表达式从 html 文档实时 url 获取任何用户输入的 html 标签

发布于 2024-11-17 05:35:28 字数 486 浏览 0 评论 0原文

我想获取 HTML 页面上可用的任何元、标题、脚本、链接标签,这是我编写的程序(不正确,但会给专家提供想法)。

<?php
function get_tag($tag_name, $url)
{
    $content = file_get_contents($url);

    // this is not correct : regular expression please //
    preg_match_all($tag_name, $content, $matches);

    return $matches;
}

print_r(get_tag('title', 'http://stackoverflow.com'));

?>

输出应该是这样的:

Array
(
    [0] => title
    [1] => Stack Overflow
)

谢谢!!

I want to fetch any meta, title, script, link tag that is available on HTML page, this is the program i write (not correct but will give idea for experts).

<?php
function get_tag($tag_name, $url)
{
    $content = file_get_contents($url);

    // this is not correct : regular expression please //
    preg_match_all($tag_name, $content, $matches);

    return $matches;
}

print_r(get_tag('title', 'http://stackoverflow.com'));

?>

Output should come something like this :

Array
(
    [0] => title
    [1] => Stack Overflow
)

Thanks!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

情何以堪。 2024-11-24 05:35:28
function get_tags($tag, $url) {
//allow for improperly formatted html
libxml_use_internal_errors(true);
// Instantiate DOMDocument Class to parse html DOM
$xml = new DOMDocument();

// Load the file into the DOMDocument object
$xml->loadHTMLFile($url);

// Empty array to hold all links to return
$tags = array();

//Loop through all tags of the given type and store details in the array
foreach($xml->getElementsByTagName($tag) as $tag_found) {
      if ($tag_found->tagName == "meta")
      {
        $tags[] = array("meta_name" => $tag_found->getAttribute("name"), "meta_value" => $tag_found->getAttribute("content"));
      }
      else {
    $tags[] = array('tag' => $tag_found->tagName, 'text' => $tag_found->nodeValue);
     }
}

//Return the links
return $tags;
}

这个答案实际上会给你标签的名称作为你的第一个数组值而不是“数组”,并且也会停止警告。

function get_tags($tag, $url) {
//allow for improperly formatted html
libxml_use_internal_errors(true);
// Instantiate DOMDocument Class to parse html DOM
$xml = new DOMDocument();

// Load the file into the DOMDocument object
$xml->loadHTMLFile($url);

// Empty array to hold all links to return
$tags = array();

//Loop through all tags of the given type and store details in the array
foreach($xml->getElementsByTagName($tag) as $tag_found) {
      if ($tag_found->tagName == "meta")
      {
        $tags[] = array("meta_name" => $tag_found->getAttribute("name"), "meta_value" => $tag_found->getAttribute("content"));
      }
      else {
    $tags[] = array('tag' => $tag_found->tagName, 'text' => $tag_found->nodeValue);
     }
}

//Return the links
return $tags;
}

This answer will actually give you the name of the tag as your first array value rather than "array" and will also stop the warning.

丶视觉 2024-11-24 05:35:28

在使用正则表达式解析 HTML 之前,您需要阅读 这个问题

尝试使用 DOMDocument,如下所示:

<?

function get_tags($tags, $url) {

    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();

    // Load the url's contents into the DOM
    $xml->loadHTMLFile($url);

    // Empty array to hold all links to return
    $tags_found = array();

    //Loop through each <$tags> tag in the dom and add it to the $tags_found array
    foreach($xml->getElementsByTagName($tags) as $tag) {
        $tags_found[] = array('tag' => $tags, 'text' => $tag->nodeValue);
    }

    //Return the links
    return $tags_found;
}

print_r(get_tags('title', 'http://stackoverflow.com'));

?>

Before using regex for parsing HTML, you want to read the first answer from this question.

Try with DOMDocument, like this:

<?

function get_tags($tags, $url) {

    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();

    // Load the url's contents into the DOM
    $xml->loadHTMLFile($url);

    // Empty array to hold all links to return
    $tags_found = array();

    //Loop through each <$tags> tag in the dom and add it to the $tags_found array
    foreach($xml->getElementsByTagName($tags) as $tag) {
        $tags_found[] = array('tag' => $tags, 'text' => $tag->nodeValue);
    }

    //Return the links
    return $tags_found;
}

print_r(get_tags('title', 'http://stackoverflow.com'));

?>
海夕 2024-11-24 05:35:28

由于这些标签不能嵌套,因此不需要解析。

#<(meta|title|script|link)(?: .*?)?(?:/>|>(.*?)<(?:/\1)>)#is

如果您在函数中使用它,则必须编写 $tag_name 而不是“meta|title|script|link”。

Since these tags cannot be nested, parsing is not necessary.

#<(meta|title|script|link)(?: .*?)?(?:/>|>(.*?)<(?:/\1)>)#is

If you are using this with your function, you will have to write $tag_name instead "meta|title|script|link".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文