如何使用 PHP 获取页面 HTML DOM 的一部分？

发布于 2024-12-10 13:40:38 字数 786 浏览 1 评论 0原文

我正在从已发布的 Google 电子表格中获取数据，我想要的只是内容 div 内的信息 (

...

)

我知道内容以

开头，以

什么是最好/最获取内部 DOM 部分的有效方法是什么？我正在考虑正则表达式（请参阅下面的示例），但它不起作用，我不确定它是否有效...

header('Content-type: text/plain');

$foo = file_get_contents('https://docs.google.com/spreadsheet/pub?key=0Ahuij-1M3dgvdG8waTB0UWJDT3NsUEdqNVJTWXJNaFE&single=true&gid=0&output=html&ndplr=1');

$start = '<div id="content">';
$end = '<div id="footer">';

$foo = preg_replace("#$start(.*?)$end#",'$1',$foo);

echo $foo;

更新

我想我的另一个问题基本上是关于它是否只是更简单、更容易使用带有起点和终点的正则表达式，而不是尝试解析可能有错误的 DOM，然后提取我需要的部分。似乎正则表达式是可行的方法，但很想听听您的意见。

原文

I'm grabbing data from a published google spreadsheet, and all I want is the information inside of the content div (<div id="content">...</div>)

I know that the content starts off as <div id="content"> and ends as </div><div id="footer">

What's the best / most efficient way to grab the part of the DOM that is inside there? I was thinking regular expression (see my example below) but it is not working and I'm not sure if it that efficient...

header('Content-type: text/plain');

$foo = file_get_contents('https://docs.google.com/spreadsheet/pub?key=0Ahuij-1M3dgvdG8waTB0UWJDT3NsUEdqNVJTWXJNaFE&single=true&gid=0&output=html&ndplr=1');

$start = '<div id="content">';
$end = '<div id="footer">';

$foo = preg_replace("#$start(.*?)$end#",'$1',$foo);

echo $foo;

UPDATE

I guess another question I have is basically about if it is just simpler and easier to use regex with start and end points rather than trying to parse through a DOM which might have errors and then extract the piece I need. Seems like regex would be the way to go but would love to hear your opinions.

分享到QQ

分享到微博