当前位置：文江博客话题详情

XSLT 1.0 HTML 字数统计

发布于 2024-09-14 20:04:45 字数 60 浏览 7 评论 0 原文

我希望调用一个模板，将字段缩减为 30 个单词。但是，此字段包含 HTML，并且 HTML 不应算作单词。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

想挽留 2024-09-21 20:04:45

尝试一下这个，尽管不可否认翻译调用有点难看：

<xsl:template match="field">
  <xsl:value-of select="string-length(translate(normalize-space(.),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',''))+1" />
</xsl:template>

这当然要求翻译调用中的字符串包含可能出现在字段中的所有字符（空格除外）。它的工作原理是首先调用 normalize-space(.) 来删除双空格以及除文本内容之外的所有内容。然后它会删除除空格之外的所有内容，计算结果字符串的长度并加一。这确实意味着如果您有


Mytext test

这将计为 2，因为它会将 Mytext 视为一个单词。

如果您需要更强大的解决方案，那就有点复杂了：

<xsl:template match="field">
  <xsl:call-template name="countwords">
    <xsl:with-param name="text" select="normalize-space(.)" />
  </xsl:call-template>
</xsl:template>

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="contains($text,' ')">
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count + 1" />
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise><xsl:value-of select="$count + 1" /></xsl:otherwise>
  </xsl:choose>
</xsl:template>

这会将 normalize-space(.) 的结果传递到递归命名模板中，当 $text 中存在空格时，该模板会调用自身，增加其 count 参数，并每次使用 substring-after($text,' ') 调用截断第一个单词。如果没有空格，则它将 $text 视为单个单词，并仅返回 $count + 1 （+1 表示当前单词）。

请记住，这将包括字段内的所有文本内容，包括内部元素内的文本内容。

编辑：自我注意：正确阅读问题，只是注意到您需要的不仅仅是字数统计。如果您想包含任何 xml 标签，那么做起来要复杂得多，但是对上面的内容稍作修改就可以吐出每个单词，而不是简单地计算它们：

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="$count = 30" />
    <xsl:when test="contains($text,' ')">
      <xsl:if test="$count != 0"><xsl:text> </xsl:text></xsl:if>
      <xsl:value-of select="substring-before($text,' ')" />
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count + 1" />
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise><xsl:value-of select="$text" /></xsl:otherwise>
  </xsl:choose>
</xsl:template>

有一个额外的 子句在计数达到 30 时停止递归，并且递归子句输出文本（如果它不是第一个单词，则在开头添加一个空格）。

编辑：好的，这里有一个保留转义 XML 内容的解决方案：

<xsl:template match="field">
  <xsl:call-template name="countwords">
    <xsl:with-param name="text" select="." />
  </xsl:call-template>
</xsl:template>

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="starts-with($text, '<')">
      <xsl:value-of select="concat(substring-before($text,'>'),'>')" />
      <xsl:call-template name="countwords">
        <xsl:with-param name="count">
          <xsl:choose>
            <xsl:when test="starts-with(substring-after($text,'>'),' ')"><xsl:value-of select="$count + 1" /></xsl:when>
            <xsl:otherwise><xsl:value-of select="$count" /></xsl:otherwise>
          </xsl:choose>
        </xsl:with-param>
        <xsl:with-param name="text" select="substring-after($text,'>')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="(contains($text, '<') and contains($text, ' ') and string-length(substring-before($text,' ')) < string-length(substring-before($text,'<'))) or (contains($text,' ') and not(contains($text,'<')))">
      <xsl:choose>
        <xsl:when test="$count < 29"><xsl:value-of select="concat(substring-before($text, ' '),' ')" /></xsl:when>
        <xsl:when test="$count = 29"><xsl:value-of select="substring-before($text, ' ')" /></xsl:when>
      </xsl:choose>
      <xsl:call-template name="countwords">
        <xsl:with-param name="count">
          <xsl:choose>
            <xsl:when test="normalize-space(substring-before($text, ' ')) = ''"><xsl:value-of select="$count" /></xsl:when>
            <xsl:otherwise><xsl:value-of select="$count + 1" /></xsl:otherwise>
          </xsl:choose>
        </xsl:with-param>
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="(contains($text, '<') and contains($text, ' ') and string-length(substring-before($text,' ')) > string-length(substring-before($text,'<'))) or contains($text,'<')">
      <xsl:if test="$count < 30">
        <xsl:value-of select="substring-before($text, '<')" />
      </xsl:if>
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count" />
        <xsl:with-param name="text" select="concat('<',substring-after($text,'<'))" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:if test="$count < 30">
        <xsl:value-of select="$text" />
      </xsl:if>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

如果您需要更好地解释其中任何内容，请告诉我，除非您需要，否则我宁愿不详细说明！

Try this, although admittedly the translate call's a bit ugly:

<xsl:template match="field">
  <xsl:value-of select="string-length(translate(normalize-space(.),'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',''))+1" />
</xsl:template>

This of course requires that the string in the translate call includes all characters that could appear in the field, other than spaces. It works by first calling normalize-space(.) to strip out both double-spaces and all but the text content. It then removes everything except spaces, counts the length of the resulting string and adds one. It does mean if you have <p>My<b>text</b> test</p> this will count as 2, as it will consider Mytext to be one word.

If you need a more robust solution, it's a little more convoluted:

<xsl:template match="field">
  <xsl:call-template name="countwords">
    <xsl:with-param name="text" select="normalize-space(.)" />
  </xsl:call-template>
</xsl:template>

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="contains($text,' ')">
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count + 1" />
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise><xsl:value-of select="$count + 1" /></xsl:otherwise>
  </xsl:choose>
</xsl:template>

This passes the result of normalize-space(.) into a recursive named template that calls itself when there's a space in $text, incrementing it's count parameter, and chopping off the first word each time using the substring-after($text,' ') call. If there's no space, then it treats $text as a single word, and just returns $count + 1 (+1 for the current word).

Bear in mind that this will include ALL text content within the field, including those within inner elements.

EDIT: Note to self: read the question properly, just noticed you needed more than just a word count. That's significantly more complicated to do if you want to include any xml tags, but a slight modification of the above is all it takes to spit out each word rather than simply count them:

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="$count = 30" />
    <xsl:when test="contains($text,' ')">
      <xsl:if test="$count != 0"><xsl:text> </xsl:text></xsl:if>
      <xsl:value-of select="substring-before($text,' ')" />
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count + 1" />
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise><xsl:value-of select="$text" /></xsl:otherwise>
  </xsl:choose>
</xsl:template>

There's an extra <xsl:when clause to simply stop recursing when count hits 30, and the recursive clause outputs the text, after adding a space at the beginning if it wasn't the first word.

EDIT: Ok, here's a solution that keeps the escaped XML content:

<xsl:template match="field">
  <xsl:call-template name="countwords">
    <xsl:with-param name="text" select="." />
  </xsl:call-template>
</xsl:template>

<xsl:template name="countwords">
  <xsl:param name="count" select="0" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="starts-with($text, '<')">
      <xsl:value-of select="concat(substring-before($text,'>'),'>')" />
      <xsl:call-template name="countwords">
        <xsl:with-param name="count">
          <xsl:choose>
            <xsl:when test="starts-with(substring-after($text,'>'),' ')"><xsl:value-of select="$count + 1" /></xsl:when>
            <xsl:otherwise><xsl:value-of select="$count" /></xsl:otherwise>
          </xsl:choose>
        </xsl:with-param>
        <xsl:with-param name="text" select="substring-after($text,'>')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="(contains($text, '<') and contains($text, ' ') and string-length(substring-before($text,' ')) < string-length(substring-before($text,'<'))) or (contains($text,' ') and not(contains($text,'<')))">
      <xsl:choose>
        <xsl:when test="$count < 29"><xsl:value-of select="concat(substring-before($text, ' '),' ')" /></xsl:when>
        <xsl:when test="$count = 29"><xsl:value-of select="substring-before($text, ' ')" /></xsl:when>
      </xsl:choose>
      <xsl:call-template name="countwords">
        <xsl:with-param name="count">
          <xsl:choose>
            <xsl:when test="normalize-space(substring-before($text, ' ')) = ''"><xsl:value-of select="$count" /></xsl:when>
            <xsl:otherwise><xsl:value-of select="$count + 1" /></xsl:otherwise>
          </xsl:choose>
        </xsl:with-param>
        <xsl:with-param name="text" select="substring-after($text,' ')" />
      </xsl:call-template>
    </xsl:when>
    <xsl:when test="(contains($text, '<') and contains($text, ' ') and string-length(substring-before($text,' ')) > string-length(substring-before($text,'<'))) or contains($text,'<')">
      <xsl:if test="$count < 30">
        <xsl:value-of select="substring-before($text, '<')" />
      </xsl:if>
      <xsl:call-template name="countwords">
        <xsl:with-param name="count" select="$count" />
        <xsl:with-param name="text" select="concat('<',substring-after($text,'<'))" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:if test="$count < 30">
        <xsl:value-of select="$text" />
      </xsl:if>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

If you need any of it explained better, let me know, I'd rather not go into detail unless you need it!

回复收藏 0 原文

凉墨 2024-09-21 20:04:45

这是一种稍微不同的方法：

如果您可以清理输入，以便获得想要进行字数统计的文本的规范化字符串，则可以将带空格的字符串的字符串长度与带空格的字符串的字符串长度进行比较已删除。差异应该是你的字数。

字数统计函数（模板）将如下所示：

<xsl:template name="wordCount">
    <xsl:param name="input" required="yes"/>
    <xsl:param name="sep" select="'‒–—―'"/>
    <xsl:variable name="big"><xsl:value-of select="normalize-space(translate($input, $sep, ' '))"/></xsl:variable>
    <xsl:variable name="small"><xsl:value-of select="translate($big, ' ', '')"/></xsl:variable>
    <xsl:value-of select="string-length($big)-string-length($small)"/>
</xsl:template>

$sep 参数允许您定义要计为单词分隔符的任何字符（以及空格）的列表。

然后，您可以在调用模板时使用序列构造函数来构建所需的字符串（我将其作为读者的练习）：

<xsl:call-template name="wordCount">
    <xsl:with-param name="input">
        <!-- templates etc to output text from html -->
    </xsl:with-param>
</xsl:call-template>

Here's a slightly different approach:

If you can clean your input so that you get a normalised string of the text you want to word count, you can compare the string-length of the string with spaces to the string-length of the string with spaces removed. The difference should be your word count.

The word count function (template) will look something like this:

<xsl:template name="wordCount">
    <xsl:param name="input" required="yes"/>
    <xsl:param name="sep" select="'‒–—―'"/>
    <xsl:variable name="big"><xsl:value-of select="normalize-space(translate($input, $sep, ' '))"/></xsl:variable>
    <xsl:variable name="small"><xsl:value-of select="translate($big, ' ', '')"/></xsl:variable>
    <xsl:value-of select="string-length($big)-string-length($small)"/>
</xsl:template>

The $sep parameter allows you to define a list of any character(s) (as well as white-space) that you want to count as a word separator.

You can then use a sequence constructor when you call the template to build the string you want (I'll leave that as an exercise for the reader):

<xsl:call-template name="wordCount">
    <xsl:with-param name="input">
        <!-- templates etc to output text from html -->
    </xsl:with-param>
</xsl:call-template>

回复收藏 0 原文

~没有更多了~