DOM 操作

发布于 2024-09-02 21:46:29 字数 1275 浏览 2 评论 0原文

我试图在 PHP 中使用 DOM 来完成一项非常具体的工作,但到目前为止我还没有运气,目标是从 WordPress 博客文章中获取 HTML 字符串(来自 DB,这是一个 WordPress 插件)。然后在该 HTML 中将

old content
" 替换为
new content
;"
代替它。在其结构中保存该 div 上方和下方的所有内容。

然后将 HTML 保存回数据库,实际上应该很简单,我读到正则表达式不是这里的正确方法,所以我转向了 DOM。

问题是我无法让它工作,无法提取 div 或任何东西。

帮我!!

更新

来自 wordpress 表的 HTML 看起来像:

Congratulations on finding us here on the world wide web, we are on a  mission to create a website that will show off your culinary skills  better than any other website does.

<div id="do_not_edit">blah blah</div>
We want this website to be fun and  easy to use, we strive for simple elegance and incredible functionality.We aim to provide a 'complete package'. By this we want to create a  website where people can meet, share ideas and help each other out.

经过几次不同的(不正确的)工作后,我得到的结果如下:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));        

$doc = new DOMDocument();
$doc->validateOnParse = true; 
$doc->loadHTMLFile($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

Im trying to use the DOM in PHP to do a pretty specific job and Ive got no luck so far, the objective is to take a string of HTML from a Wordpress blog post (from the DB, this is a wordpress plugin). And then out of that HTML replace <div id="do_not_edit">old content</div>" with <div id="do_not_edit">new content</div>" in its place. Saving anything above and below that div in its structure.

Then save the HTML back into the DB, should be simple really, I have read that a regex wouldnt be the right way to go here so Ive turned to the DOM instead.

The problem is I just cant get it to work, cant extract the div or anything.

Help me!!

UPDATE

The HTML coming out of the wordpress table looks like:

Congratulations on finding us here on the world wide web, we are on a  mission to create a website that will show off your culinary skills  better than any other website does.

<div id="do_not_edit">blah blah</div>
We want this website to be fun and  easy to use, we strive for simple elegance and incredible functionality.We aim to provide a 'complete package'. By this we want to create a  website where people can meet, share ideas and help each other out.

After several different (incorrect) workings all Ive got below is:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));        

$doc = new DOMDocument();
$doc->validateOnParse = true; 
$doc->loadHTMLFile($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ぃ双果 2024-09-09 21:46:29

如果您确定 WordPress 中的 HTML 仅包含一个 div,则以下操作应该有效:

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');
echo $divs->item(0)->textContent;

如果没有,请尝试:

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');

for($i=0; $i<$divs->length; $i++)
{
  $id = $divs->item($i)->attributes->getNamedItem('id');
  if($id && $id->value == 'do_not_edit')
  {
    //your code here...
    $node = $divs->item($i);
    $newText = new DOMText("This is some new content");

    $node->appendChild($newText);
    $node->removeChild($node->firstChild);
    break;
  }
}

$html = $doc->saveHTML();

If you are sure that the HTML from WordPress contains only one div, the following should work:

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');
echo $divs->item(0)->textContent;

If not, try:

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');

for($i=0; $i<$divs->length; $i++)
{
  $id = $divs->item($i)->attributes->getNamedItem('id');
  if($id && $id->value == 'do_not_edit')
  {
    //your code here...
    $node = $divs->item($i);
    $newText = new DOMText("This is some new content");

    $node->appendChild($newText);
    $node->removeChild($node->firstChild);
    break;
  }
}

$html = $doc->saveHTML();
辞取 2024-09-09 21:46:29

您的 HTML 不是完整的 HTML 文档,这正是 DOMDocument 所期望的。一种选择是包装您的 HTML,使其成为一个完整的文档:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));

$content = '<html><head><title></title></head><body>'.$content.'</body></html>';

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

这有点 hacky,但可能很容易解决问题。

Your HTML is not a complete HTML document, which is what DOMDocument expects. One option would be to wrap your HTML so it's a complete document:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));

$content = '<html><head><title></title></head><body>'.$content.'</body></html>';

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

It's a bit hacky, but might easily solve the problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文