修复未封闭的 HTML 标签

发布于 2024-12-21 08:47:38 字数 445 浏览 1 评论 0 原文

我正在设计一些博客布局,我需要创建每篇文章的摘要(比如最新的 15 篇文章)以显示在主页上。现在我使用的内容已经由纺织库格式化为 html 标签。现在,如果我使用 substr 获取帖子的前 500 个字符,我面临的主要问题是如何关闭未关闭的标签。

例如

<div>.......................</div>
<div>...........
     <p>............</p>
     <p>...........| 500 chars
     </p>
<div>  

我得到的是两个未闭合的标签

, p 不会造成太多麻烦,但 div 只会扰乱整个页面布局。那么有什么建议如何跟踪开始标签并手动关闭它们或其他什么吗?

I am working on some blog layout and I need to create an abstract of each post (say 15 of the lastest) to show on the homepage. Now the content I use is already formatted in html tags by the textile library. Now if I use substr to get 1st 500 chars of the post, the main problem that I face is how to close the unclosed tags.

e.g

<div>.......................</div>
<div>...........
     <p>............</p>
     <p>...........| 500 chars
     </p>
<div>  

What I get is two unclosed tags <p> and <div> , p wont create much trouble , but div just messes with the whole page layout. So any suggestion how to track the opening tags and close them manually or something?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

远山浅 2024-12-28 08:47:38

正如 ajreal 所说,DOMDocument 是一个解决方案。

示例:

$str = "
<html>
 <head>
  <title>test</title>
 </head>
 <body>
  <p>error</i>
 </body>
</html>
";

$doc = new DOMDocument();
@$doc->loadHTML($str);
echo $doc->saveHTML();

优点: 原生包含在 PHP 中,与 PHP Tidy 相反。

As ajreal said, DOMDocument is a solution.

Example :

$str = "
<html>
 <head>
  <title>test</title>
 </head>
 <body>
  <p>error</i>
 </body>
</html>
";

$doc = new DOMDocument();
@$doc->loadHTML($str);
echo $doc->saveHTML();

Advantage : natively included in PHP, contrary to PHP Tidy.

月依秋水 2024-12-28 08:47:38

有很多方法可以使用:

  1. 使用适当的 HTML 解析器,例如 DOMDocument
  2. 使用 PHP Tidy 修复未关闭的标签
  3. 有些人会建议 HTML 净化器

There are lots of methods that can be used:

  1. Use a proper HTML parser, like DOMDocument
  2. Use PHP Tidy to repair the un-closed tag
  3. Some would suggest HTML Purifier
浅浅 2024-12-28 08:47:38

您可以使用 DOMDocument 来执行此操作,但要注意字符串编码问题。此外,您还必须使用完整的 HTML 文档,然后提取所需的组件。下面是一个示例:

function make_excerpt ($rawHtml, $length = 500) {
  // append an ellipsis and "More" link
  $content = substr($rawHtml, 0, $length)
    . '… <a href="/link-to-somewhere">More ></a>';

  // Detect the string encoding
  $encoding = mb_detect_encoding($content);

  // pass it to the DOMDocument constructor
  $doc = new DOMDocument('', $encoding);

  // Must include the content-type/charset meta tag with $encoding
  // Bad HTML will trigger warnings, suppress those
  @$doc->loadHTML('<html><head>'
    . '<meta http-equiv="content-type" content="text/html; charset='
    . $encoding . '"></head><body>' . trim($content) . '</body></html>');

  // extract the components we want
  $nodes = $doc->getElementsByTagName('body')->item(0)->childNodes;
  $html = '';
  $len = $nodes->length;
  for ($i = 0; $i < $len; $i++) {
    $html .= $doc->saveHTML($nodes->item($i));
  }
  return $html;
}

$html = "<p>.......................</p>
  <p>...........
    <p>............</p>
    <p>...........| 500 chars";

// output fixed html
echo make_excerpt($html, 500);

输出:

<p>.......................</p>
  <p>...........
    </p>
<p>............</p>
    <p>...........| 500 chars… <a href="/link-to-somewhere">More ></a></p>

如果您使用 WordPress,则应将 substr() 调用包装在对 wpautop 的调用中 - wpautop(substr(...) )。您可能还希望测试传递给函数的 $rawHtml 的长度,如果不够长,则跳过附加“更多”链接。

You can use DOMDocument to do it, but be careful of string encoding issues. Also, you'll have to use a complete HTML document, then extract the components you want. Here's an example:

function make_excerpt ($rawHtml, $length = 500) {
  // append an ellipsis and "More" link
  $content = substr($rawHtml, 0, $length)
    . '… <a href="/link-to-somewhere">More ></a>';

  // Detect the string encoding
  $encoding = mb_detect_encoding($content);

  // pass it to the DOMDocument constructor
  $doc = new DOMDocument('', $encoding);

  // Must include the content-type/charset meta tag with $encoding
  // Bad HTML will trigger warnings, suppress those
  @$doc->loadHTML('<html><head>'
    . '<meta http-equiv="content-type" content="text/html; charset='
    . $encoding . '"></head><body>' . trim($content) . '</body></html>');

  // extract the components we want
  $nodes = $doc->getElementsByTagName('body')->item(0)->childNodes;
  $html = '';
  $len = $nodes->length;
  for ($i = 0; $i < $len; $i++) {
    $html .= $doc->saveHTML($nodes->item($i));
  }
  return $html;
}

$html = "<p>.......................</p>
  <p>...........
    <p>............</p>
    <p>...........| 500 chars";

// output fixed html
echo make_excerpt($html, 500);

Outputs:

<p>.......................</p>
  <p>...........
    </p>
<p>............</p>
    <p>...........| 500 chars… <a href="/link-to-somewhere">More ></a></p>

If you are using WordPress you should wrap the substr() invocation in a call to wpautop - wpautop(substr(...)). You may also wish to test the length of the $rawHtml passed to the function, and skip appending the "More" link if it isn't long enough.

谁把谁当真 2024-12-28 08:47:38

我找到了一个使用 DOMDocument 但不会向字符串添加额外标签的解决方案;只是修复格式错误的 HTML。请参阅此处的答案:https://stackoverflow.com/a/79081559/492132

原始github(不是我的):< a href="https://gist.github.com/hubgit/1322324" rel="nofollow noreferrer">https://gist.github.com/hubgit/1322324

I found a solution which uses DOMDocument but does not add extra tags to your strings; just fixes malformed HTML. See answer here: https://stackoverflow.com/a/79081559/492132

Original github (not mine) here: https://gist.github.com/hubgit/1322324

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文