DOM 操作

发布于 2024-09-02 21:46:29 字数 1275 浏览 2 评论 0原文

我试图在 PHP 中使用 DOM 来完成一项非常具体的工作，但到目前为止我还没有运气，目标是从 WordPress 博客文章中获取 HTML 字符串（来自 DB，这是一个 WordPress 插件）。然后在该 HTML 中将

old content

" 替换为

new content
;"

代替它。在其结构中保存该 div 上方和下方的所有内容。

然后将 HTML 保存回数据库，实际上应该很简单，我读到正则表达式不是这里的正确方法，所以我转向了 DOM。

问题是我无法让它工作，无法提取 div 或任何东西。

帮我！！

更新

来自 wordpress 表的 HTML 看起来像：

Congratulations on finding us here on the world wide web, we are on a  mission to create a website that will show off your culinary skills  better than any other website does.

<div id="do_not_edit">blah blah</div>
We want this website to be fun and  easy to use, we strive for simple elegance and incredible functionality.We aim to provide a 'complete package'. By this we want to create a  website where people can meet, share ideas and help each other out.

经过几次不同的（不正确的）工作后，我得到的结果如下：

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));        

$doc = new DOMDocument();
$doc->validateOnParse = true; 
$doc->loadHTMLFile($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

原文

Im trying to use the DOM in PHP to do a pretty specific job and Ive got no luck so far, the objective is to take a string of HTML from a Wordpress blog post (from the DB, this is a wordpress plugin). And then out of that HTML replace <div id="do_not_edit">old content</div>" with <div id="do_not_edit">new content</div>" in its place. Saving anything above and below that div in its structure.

Then save the HTML back into the DB, should be simple really, I have read that a regex wouldnt be the right way to go here so Ive turned to the DOM instead.

The problem is I just cant get it to work, cant extract the div or anything.

Help me!!

UPDATE

The HTML coming out of the wordpress table looks like:

Congratulations on finding us here on the world wide web, we are on a  mission to create a website that will show off your culinary skills  better than any other website does.

<div id="do_not_edit">blah blah</div>
We want this website to be fun and  easy to use, we strive for simple elegance and incredible functionality.We aim to provide a 'complete package'. By this we want to create a  website where people can meet, share ideas and help each other out.

After several different (incorrect) workings all Ive got below is:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));        

$doc = new DOMDocument();
$doc->validateOnParse = true; 
$doc->loadHTMLFile($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぃ双果 2024-09-09 21:46:29

如果您确定 WordPress 中的 HTML 仅包含一个 div，则以下操作应该有效：

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');
echo $divs->item(0)->textContent;

如果没有，请尝试：

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');

for($i=0; $i<$divs->length; $i++)
{
  $id = $divs->item($i)->attributes->getNamedItem('id');
  if($id && $id->value == 'do_not_edit')
  {
    //your code here...
    $node = $divs->item($i);
    $newText = new DOMText("This is some new content");

    $node->appendChild($newText);
    $node->removeChild($node->firstChild);
    break;
  }
}

$html = $doc->saveHTML();

If you are sure that the HTML from WordPress contains only one div, the following should work:

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');
echo $divs->item(0)->textContent;

If not, try:

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$divs = $doc->getElementsByTagName('div');

for($i=0; $i<$divs->length; $i++)
{
  $id = $divs->item($i)->attributes->getNamedItem('id');
  if($id && $id->value == 'do_not_edit')
  {
    //your code here...
    $node = $divs->item($i);
    $newText = new DOMText("This is some new content");

    $node->appendChild($newText);
    $node->removeChild($node->firstChild);
    break;
  }
}

$html = $doc->saveHTML();

回复收藏 0 原文

辞取 2024-09-09 21:46:29

您的 HTML 不是完整的 HTML 文档，这正是 DOMDocument 所期望的。一种选择是包装您的 HTML，使其成为一个完整的文档：

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));

$content = '<html><head><title></title></head><body>'.$content.'</body></html>';

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

这有点 hacky，但可能很容易解决问题。

Your HTML is not a complete HTML document, which is what DOMDocument expects. One option would be to wrap your HTML so it's a complete document:

$content = ($wpdb->get_var( "SELECT `post_content` FROM $wpdb->posts WHERE ID = {$article[post_id]}" ));

$content = '<html><head><title></title></head><body>'.$content.'</body></html>';

$doc = new DOMDocument();
$doc->validateOnParse = false; 
$doc->loadHTML($content);
$element = $doc->getElementById('do_not_edit');
echo $element;

It's a bit hacky, but might easily solve the problem.

回复收藏 0 原文

~没有更多了~