我如何使用 file_get_contents 和 preg_match 屏幕抓取这样的页面？

发布于 2024-12-28 07:07:09 字数 473 浏览 4 评论 0原文

我有一个包含许多 HTML 行的页面，如下所示：

<ul><li><a href='a_silly_link_that_changes_each_line.php'>the_content_i_need</a></li></ul>

现在，如您所见，该行中有一个链接，不幸的是，该链接在每一行上都发生了变化。

因此，我需要一种方法来抓取该行中的内容，而不让链接妨碍。

我也尝试过像这样抓取： .php'>(*.) 但这不好，因为它返回分配不需要的内容。

另外，因为页面上有很多行我需要从中获取内容，所以我可以循环吗？

我正在使用 preg_match 和 file_get_contents，但我愿意接受其他建议。 :)

原文

I have a page with many HTML lines like this:

<ul><li><a href='a_silly_link_that_changes_each_line.php'>the_content_i_need</a></li></ul>

Now as you can see, theres a link in that line, which unfortunately changes on each line.

So I need a way to scrape the content in that line, without letting the link get in the way.

I've also tried to scrape like this: .php'>(*.)</a></li></ul> but thats no good, as it returns allot of unwanted content.

Also, because there are many lines on the page that i need to take the content from, could i just loop through, somehow?

I'm using preg_match and file_get_contents but am open to other suggestions. :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

山人契 2025-01-04 07:07:09

来自：PHP 解析 HTML 代码

使用类似以下内容：

   $str = '<ul><li><a src="test.html">linky</a></li></ul>';
   $DOM = new DOMDocument;
   $DOM->loadHTML($str);
   $items = $DOM->getElementsByTagName('ul');
    for($i =0;$i<$items->length;$i++){
        $ul = $items->item($i);
        $li=$ul->firstChild;
        if($li->nodeName=='li' && $li->firstChild->nodeName=='a'){
            //do something with $li->firstChild->nodeValue 

        }
    }

在本例中，$li- >firstChild->nodeValue 将是 linky。

应该可以了:)

From: PHP Parse HTML code

Use something like:

   $str = '<ul><li><a src="test.html">linky</a></li></ul>';
   $DOM = new DOMDocument;
   $DOM->loadHTML($str);
   $items = $DOM->getElementsByTagName('ul');
    for($i =0;$i<$items->length;$i++){
        $ul = $items->item($i);
        $li=$ul->firstChild;
        if($li->nodeName=='li' && $li->firstChild->nodeName=='a'){
            //do something with $li->firstChild->nodeValue 

        }
    }

In this case, $li->firstChild->nodeValue will be linky.

That should do it :)

回复收藏 0 原文

椒妓 2025-01-04 07:07:09

尝试使用

$match = array();
preg_match_all( '~\\.php>(.*?)</a></li></ul>~', file_get_contents( $filename), $matches, PREG_SET_ORDER)`.

这将匹配文件中的所有链接。 *? 表示“匹配 0-inf 字符，但字符尽可能少”（贪婪杀手），这样您就不会得到任何不需要的内容。

Try using

$match = array();
preg_match_all( '~\\.php>(.*?)</a></li></ul>~', file_get_contents( $filename), $matches, PREG_SET_ORDER)`.

This will match all links inside your file. *? means "match 0-inf characters but as little characters as possible" (greedy killer) so you won't be getting any unvanted content.

回复收藏 0 原文

~没有更多了~