修复未封闭的 HTML 标签

发布于 2024-12-21 08:47:38 字数 445 浏览 1 评论 0 原文

我正在设计一些博客布局，我需要创建每篇文章的摘要（比如最新的 15 篇文章）以显示在主页上。现在我使用的内容已经由纺织库格式化为 html 标签。现在，如果我使用 substr 获取帖子的前 500 个字符，我面临的主要问题是如何关闭未关闭的标签。

例如

<div>.......................</div>
<div>...........
     <p>............</p>
     <p>...........| 500 chars
     </p>
<div>

我得到的是两个未闭合的标签

和

, p 不会造成太多麻烦，但 div 只会扰乱整个页面布局。那么有什么建议如何跟踪开始标签并手动关闭它们或其他什么吗？

原文

I am working on some blog layout and I need to create an abstract of each post (say 15 of the lastest) to show on the homepage. Now the content I use is already formatted in html tags by the textile library. Now if I use substr to get 1st 500 chars of the post, the main problem that I face is how to close the unclosed tags.

e.g

<div>.......................</div>
<div>...........
     <p>............</p>
     <p>...........| 500 chars
     </p>
<div>

What I get is two unclosed tags <p> and <div> , p wont create much trouble , but div just messes with the whole page layout. So any suggestion how to track the opening tags and close them manually or something?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

远山浅 2024-12-28 08:47:38

正如 ajreal 所说，DOMDocument 是一个解决方案。

示例：

$str = "
<html>
 <head>
  <title>test</title>
 </head>
 <body>
  <p>error</i>
 </body>
</html>
";

$doc = new DOMDocument();
@$doc->loadHTML($str);
echo $doc->saveHTML();

优点：原生包含在 PHP 中，与 PHP Tidy 相反。

As ajreal said, DOMDocument is a solution.

Example :

$str = "
<html>
 <head>
  <title>test</title>
 </head>
 <body>
  <p>error</i>
 </body>
</html>
";

$doc = new DOMDocument();
@$doc->loadHTML($str);
echo $doc->saveHTML();

Advantage : natively included in PHP, contrary to PHP Tidy.

回复收藏 0 原文

月依秋水 2024-12-28 08:47:38

有很多方法可以使用：

使用适当的 HTML 解析器，例如 DOMDocument
使用 PHP Tidy 修复未关闭的标签
有些人会建议 HTML 净化器

回复收藏 0 原文

浅浅 2024-12-28 08:47:38

您可以使用 DOMDocument 来执行此操作，但要注意字符串编码问题。此外，您还必须使用完整的 HTML 文档，然后提取所需的组件。下面是一个示例：

function make_excerpt ($rawHtml, $length = 500) {
  // append an ellipsis and "More" link
  $content = substr($rawHtml, 0, $length)
    . '… <a href="/link-to-somewhere">More ></a>';

  // Detect the string encoding
  $encoding = mb_detect_encoding($content);

  // pass it to the DOMDocument constructor
  $doc = new DOMDocument('', $encoding);

  // Must include the content-type/charset meta tag with $encoding
  // Bad HTML will trigger warnings, suppress those
  @$doc->loadHTML('<html><head>'
    . '<meta http-equiv="content-type" content="text/html; charset='
    . $encoding . '"></head><body>' . trim($content) . '</body></html>');

  // extract the components we want
  $nodes = $doc->getElementsByTagName('body')->item(0)->childNodes;
  $html = '';
  $len = $nodes->length;
  for ($i = 0; $i < $len; $i++) {
    $html .= $doc->saveHTML($nodes->item($i));
  }
  return $html;
}

$html = "<p>.......................</p>
  <p>...........
    <p>............</p>
    <p>...........| 500 chars";

// output fixed html
echo make_excerpt($html, 500);

输出：

<p>.......................</p>
  <p>...........
    </p>
<p>............</p>
    <p>...........| 500 chars… <a href="/link-to-somewhere">More ></a></p>

如果您使用 WordPress，则应将 substr() 调用包装在对 wpautop 的调用中 - wpautop(substr(...) ）。您可能还希望测试传递给函数的 $rawHtml 的长度，如果不够长，则跳过附加“更多”链接。

You can use DOMDocument to do it, but be careful of string encoding issues. Also, you'll have to use a complete HTML document, then extract the components you want. Here's an example:

function make_excerpt ($rawHtml, $length = 500) {
  // append an ellipsis and "More" link
  $content = substr($rawHtml, 0, $length)
    . '… <a href="/link-to-somewhere">More ></a>';

  // Detect the string encoding
  $encoding = mb_detect_encoding($content);

  // pass it to the DOMDocument constructor
  $doc = new DOMDocument('', $encoding);

  // Must include the content-type/charset meta tag with $encoding
  // Bad HTML will trigger warnings, suppress those
  @$doc->loadHTML('<html><head>'
    . '<meta http-equiv="content-type" content="text/html; charset='
    . $encoding . '"></head><body>' . trim($content) . '</body></html>');

  // extract the components we want
  $nodes = $doc->getElementsByTagName('body')->item(0)->childNodes;
  $html = '';
  $len = $nodes->length;
  for ($i = 0; $i < $len; $i++) {
    $html .= $doc->saveHTML($nodes->item($i));
  }
  return $html;
}

$html = "<p>.......................</p>
  <p>...........
    <p>............</p>
    <p>...........| 500 chars";

// output fixed html
echo make_excerpt($html, 500);

Outputs:

<p>.......................</p>
  <p>...........
    </p>
<p>............</p>
    <p>...........| 500 chars… <a href="/link-to-somewhere">More ></a></p>

If you are using WordPress you should wrap the substr() invocation in a call to wpautop - wpautop(substr(...)). You may also wish to test the length of the $rawHtml passed to the function, and skip appending the "More" link if it isn't long enough.