如何将 XHTML 缩写为任意数量的单词?

发布于 2024-07-04 17:09:14 字数 724 浏览 9 评论 0原文

如何以编程方式将 XHTML 缩写为任意数量的单词,而不留下未闭合或损坏的标签?

<p>
    Proin tristique dapibus neque. Nam eget purus sit amet leo
    tincidunt accumsan.
</p>
<p>
    Proin semper, orci at mattis blandit, augue justo blandit nulla.
    <span>Quisque ante congue justo</span>, ultrices aliquet, mattis eget,
    hendrerit, <em>justo</em>.
</p>

缩写为 25 个单词将是:

<p>
    Proin tristique dapibus neque. Nam eget purus sit amet leo
    tincidunt accumsan.
</p>
<p>
    Proin semper, orci at mattis blandit, augue justo blandit nulla.
    <span>Quisque ante congue...</span>
</p>

How would you programmacially abbreviate XHTML to an arbitrary number of words without leaving unclosed or corrupted tags?

i.e.

<p>
    Proin tristique dapibus neque. Nam eget purus sit amet leo
    tincidunt accumsan.
</p>
<p>
    Proin semper, orci at mattis blandit, augue justo blandit nulla.
    <span>Quisque ante congue justo</span>, ultrices aliquet, mattis eget,
    hendrerit, <em>justo</em>.
</p>

Abbreviated to 25 words would be:

<p>
    Proin tristique dapibus neque. Nam eget purus sit amet leo
    tincidunt accumsan.
</p>
<p>
    Proin semper, orci at mattis blandit, augue justo blandit nulla.
    <span>Quisque ante congue...</span>
</p>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

血之狂魔 2024-07-11 17:09:14

递归遍历 DOM 树,使字数统计变量保持最新。 当字数超过最大字数时,插入“...”并删除当前节点的所有后续同级节点,然后,当您通过递归返回时,删除其每个祖先的所有后续同级节点。

Recurse through the DOM tree, keeping a word count variable up to date. When the word count exceeds your maximum word count, insert "..." and remove all following siblings of the current node, then, as you go back up through the recursion, remove all the following siblings of each of its ancestors.

一曲爱恨情仇 2024-07-11 17:09:14

您需要将 XHTML 视为元素的层次结构并如此对待它。 这基本上就是处理 XML 的方式。 然后递归地遍历层次结构,将单词数加在一起。 当你达到极限时,扔掉其他一切。

我主要使用 PHP 工作,我会使用 PHP 中的 DOMDocument 类来帮助我做到这一点,你需要在你选择的语言中找到类似的东西。

为了使事情更清楚,以下是示例的层次结构:

- p
    - Proin tristique dapibus neque. Nam eget purus sit amet leo
      tincidunt accumsan.
- p
    - Proin semper, orci at mattis blandit, augue justo blandit nulla.
    - span
          - Quisque ante congue justo
    - , ultrices aliquet, mattis eget, hendrerit, 
    - em
          - justo
    - .

您达到了 span 元素内的 25 个字数限制,因此您删除了 span 内的所有剩余文本并添加了省略号。 所有其他子元素(文本和标签)都可以被丢弃,并且所有后续元素都可以被丢弃。

据我所知,这应该总是为您留下有效的标记,因为您将其视为层次结构而不仅仅是纯文本,所有所需的结束标签仍然存在。

当然,如果您正在处理的 XHTML 一开始就无效,则不要指望输出有效。

对于糟糕的层次结构示例感到抱歉,无法弄清楚如何嵌套列表。

You need to think of the XHTML as a hierarchy of elements and treat it as such. This is basically the way XML is meant to be treated. Then just go through the hierarchy recursively, adding the number of words together as you go. When you hit your limit throw everything else away.

I work mainly in PHP, and I would use the DOMDocument class in PHP to help me do this, you need to find something like that in your chosen language.

To make things clearer, here is the hierarchy for your sample:

- p
    - Proin tristique dapibus neque. Nam eget purus sit amet leo
      tincidunt accumsan.
- p
    - Proin semper, orci at mattis blandit, augue justo blandit nulla.
    - span
          - Quisque ante congue justo
    - , ultrices aliquet, mattis eget, hendrerit, 
    - em
          - justo
    - .

You hit the 25 word limit inside the span element, so you remove all remaining text within the span and add the ellipsis. All other child elements (both text and tags) can be discarded, and all subsequent elements can be discarded.

This should always leave you with valid markup as far as I can see, because you are treating it as a hierarchy and not just plain text, all closing tags that are required will still be there.

Of course if the XHTML you are dealing with is invalid to begin with, don't expect the output to be valid.

Sorry for the poor hierarchy example, couldn't work out how to nest lists.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文