获取HTML页面文本中的特定单词

发布于 2024-11-19 01:53:43 字数 316 浏览 4 评论 0原文

如果我有以下 HTML 页面，

<div>
 <p> 
  Hello world!
 </p>
 <p> <a href="example.com"> Hello and Hello again this is an example</a></p>
</div>

我想获取特定的单词，例如“hello”，并将其更改为“welcome”，无论它们在文档中的任何位置，

您有什么建议吗？无论您使用哪种类型的解析器，我都会很高兴得到您的答案？

原文

If I have the following HTML page

<div>
 <p> 
  Hello world!
 </p>
 <p> <a href="example.com"> Hello and Hello again this is an example</a></p>
</div>

I want to get the specific word for example 'hello' and change it to 'welcome' wherever they are in the document

Do you have any suggestion? I will be happy to get your answers whatever the type of parser you use?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

太阳男子 2024-11-26 01:53:43

使用 XSLT 可以轻松做到这一点。

XSLT 1.0 解决方案：

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/> 

 <xsl:param name="pTarget" select="'hello'"/>
 <xsl:param name="pReplacement" select="'welcome'"/>

 <xsl:variable name="vtargetLength" select=
 "string-length($pTarget)"/>

 <xsl:variable name="vUpper" select=
  "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
 <xsl:variable name="vLower" select=
  "'abcdefghijklmnopqrstuvwxyz'"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()" name="replace">
  <xsl:param name="pText" select="."/>

  <xsl:variable name="vLowerText" select=
  "translate($pText,$vUpper,$vLower)"/>

  <xsl:choose>
   <xsl:when test=
   "not(contains(concat(' ', $vLowerText, ' '),
                 concat(' ',$pTarget,' ')
                 )
        )">
    <xsl:value-of select="$pText"/>
   </xsl:when>

   <xsl:otherwise>
    <xsl:variable name="vOffset" select=
    "string-length(
          substring-before(concat(' ', $vLowerText, ' '),
                           concat(' ', $pTarget,' ')
                           )
                   )"/>
    <xsl:value-of select="substring($pText, 1, $vOffset)"/>
    <xsl:value-of select="$pReplacement"/>

    <xsl:call-template name="replace">
      <xsl:with-param name="pText" select=
      "substring($pText, $vOffset + $vtargetLength+1)"/>
    </xsl:call-template>
   </xsl:otherwise>
  </xsl:choose> 
 </xsl:template>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时：

<div>
 <p>
  Hello world!
 </p>
 <p> <a href="example.com"> Hello and Hello again this is an example</a></p>
</div>

生成所需的正确结果：

<div>
   <p>
  welcome world!
 </p>
   <p>
      <a href="example.com"> welcome and welcome again this is an example</a>
   </p>
</div>

我的假设是匹配并且替换不区分大小写（即“hello”和“heLlo”都应该替换为“welcome”）。如果需要区分大小写的匹配，则可以大大简化转换。

XSLT 2.0 解决方案：

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:param name="pTarget" select="'hello'"/>
 <xsl:param name="pReplacement" select="'welcome'"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()[matches(.,$pTarget, 'i')]">
   <xsl:variable name="vEnlargedRep" select=
   "replace(concat(' ',.,' '),
            concat(' ',$pTarget,' '),
            concat(' ',$pReplacement,' '),
             'i')"/>
    <xsl:variable name="vLen" select="string-length($vEnlargedRep)"/>

    <xsl:sequence select=
     "substring($vEnlargedRep,2, $vLen -2)"/>
 </xsl:template>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档（如上所示）时，再次生成所需的正确结果：

<div>
   <p>
  welcome world!
 </p>
   <p> 
      <a href="example.com"> welcome and welcome again this is an example</a>
   </p>
</div>

解释：使用标准 XPath 2.0 函数matches() 和 replace() 指定为第三个参数 "i" -- case- 的标志操作不灵敏。

This is easy to do with XSLT.

XSLT 1.0 solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/> 

 <xsl:param name="pTarget" select="'hello'"/>
 <xsl:param name="pReplacement" select="'welcome'"/>

 <xsl:variable name="vtargetLength" select=
 "string-length($pTarget)"/>

 <xsl:variable name="vUpper" select=
  "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
 <xsl:variable name="vLower" select=
  "'abcdefghijklmnopqrstuvwxyz'"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()" name="replace">
  <xsl:param name="pText" select="."/>

  <xsl:variable name="vLowerText" select=
  "translate($pText,$vUpper,$vLower)"/>

  <xsl:choose>
   <xsl:when test=
   "not(contains(concat(' ', $vLowerText, ' '),
                 concat(' ',$pTarget,' ')
                 )
        )">
    <xsl:value-of select="$pText"/>
   </xsl:when>

   <xsl:otherwise>
    <xsl:variable name="vOffset" select=
    "string-length(
          substring-before(concat(' ', $vLowerText, ' '),
                           concat(' ', $pTarget,' ')
                           )
                   )"/>
    <xsl:value-of select="substring($pText, 1, $vOffset)"/>
    <xsl:value-of select="$pReplacement"/>

    <xsl:call-template name="replace">
      <xsl:with-param name="pText" select=
      "substring($pText, $vOffset + $vtargetLength+1)"/>
    </xsl:call-template>
   </xsl:otherwise>
  </xsl:choose> 
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<div>
 <p>
  Hello world!
 </p>
 <p> <a href="example.com"> Hello and Hello again this is an example</a></p>
</div>

the wanted, correct result is produced:

<div>
   <p>
  welcome world!
 </p>
   <p>
      <a href="example.com"> welcome and welcome again this is an example</a>
   </p>
</div>

My assumption is that the matching and replacement is case-insensitive (i.e. "hello" and "heLlo" should both be replaced with "welcome"). In case a case-sensitive match is required, the transformation can be considerably simplified.

XSLT 2.0 Solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:param name="pTarget" select="'hello'"/>
 <xsl:param name="pReplacement" select="'welcome'"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()[matches(.,$pTarget, 'i')]">
   <xsl:variable name="vEnlargedRep" select=
   "replace(concat(' ',.,' '),
            concat(' ',$pTarget,' '),
            concat(' ',$pReplacement,' '),
             'i')"/>
    <xsl:variable name="vLen" select="string-length($vEnlargedRep)"/>

    <xsl:sequence select=
     "substring($vEnlargedRep,2, $vLen -2)"/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document (shown above), again the wanted, correct result is produced:

<div>
   <p>
  welcome world!
 </p>
   <p> 
      <a href="example.com"> welcome and welcome again this is an example</a>
   </p>
</div>

Explanation: Use of the standard XPath 2.0 functions matches() and replace() specifying as the third argument "i" -- a flag for case-insensitive operation.

回复收藏 0 原文

~没有更多了~