XSLT 2.0 正则表达式问题(不同匹配的开始和结束元素)

发布于 2024-09-07 15:42:56 字数 2290 浏览 6 评论 0原文

我已经稍微简化了问题,但我希望我仍然抓住了问题的本质。

假设我有以下简单的 XML 文件:

<main>
  outside1
  ===BEGIN===
    inside1
  ====END====
  outside2
  =BEGIN=
    inside2
  ==END==
  outside3
</main>

然后我可以使用以下 XSLT 2.0:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:template match="text()">

  <xsl:analyze-string select="." regex="=+BEGIN=+">
     <xsl:matching-substring>
        <section/>
     </xsl:matching-substring>
     <xsl:non-matching-substring>
          <xsl:analyze-string select="." regex="=+END=+">  
             <xsl:matching-substring>
                <_section/>
             </xsl:matching-substring>
             <xsl:non-matching-substring>
                <xsl:value-of select="."/>
             </xsl:non-matching-substring>
          </xsl:analyze-string>
     </xsl:non-matching-substring>
  </xsl:analyze-string>

</xsl:template>

</xsl:stylesheet>

将其转换为以下内容:

<?xml version="1.0" encoding="UTF-8"?>
  outside1
  <section/>
    inside1
  <_section/>
  outside2
  <section/>
    inside2
  <_section/>
  outside3

以下是问题:

多个正则表达式

是否有更好的方法来匹配两个不同的正则表达式,而不是将它们嵌套在另一个正则表达式中就像上面所做的那样?

  • 如果它们不容易像这样嵌套怎么办?
  • 我可以使用 XSL 模板来匹配和转换 text() 中的正则表达式匹配项吗?
    • 在本例中,我有两个模板,每个模板对应一个正则表达式
    • 如果可能的话,这将是理想的解决方案

在正则表达式匹配上打开和关闭元素

显然,而不是:

<section/>
   inside
<_section/>

我最终真正想要的是:

<section>
   inside
</section>

那么如何你做这个吗?我不确定是否可以在一个正则表达式匹配中打开一个元素并在另一个正则表达式匹配中关闭它(即如果没有更接近的匹配怎么办?结果将不是格式良好的 XML!),但看起来像这项任务非常典型,必须有一个惯用的解决方案。

注意:我们可以假设部分不会重叠,因此也不会嵌套。我们还可以假设它们总是成对出现。


附加信息

所以本质上我试图完成 Perl 中简单的事情:

s/=+BEGIN=+/<section>/
s/=+END=+/<\/section>/

我正在寻找一种在 XSLT 中执行此操作的方法,因为:

  • 对于正则表达式匹配的上下文,它会更加强大
    • (即它应该只转换 text() 节点)
  • 在匹配各种 XML 实体方面它也会更加健壮

I've simplified the problem somewhat, but I hope I've still captured the essence of my problem.

Let's say I have the following simple XML file:

<main>
  outside1
  ===BEGIN===
    inside1
  ====END====
  outside2
  =BEGIN=
    inside2
  ==END==
  outside3
</main>

Then I can use the following the XSLT 2.0:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:template match="text()">

  <xsl:analyze-string select="." regex="=+BEGIN=+">
     <xsl:matching-substring>
        <section/>
     </xsl:matching-substring>
     <xsl:non-matching-substring>
          <xsl:analyze-string select="." regex="=+END=+">  
             <xsl:matching-substring>
                <_section/>
             </xsl:matching-substring>
             <xsl:non-matching-substring>
                <xsl:value-of select="."/>
             </xsl:non-matching-substring>
          </xsl:analyze-string>
     </xsl:non-matching-substring>
  </xsl:analyze-string>

</xsl:template>

</xsl:stylesheet>

To transform it to the following:

<?xml version="1.0" encoding="UTF-8"?>
  outside1
  <section/>
    inside1
  <_section/>
  outside2
  <section/>
    inside2
  <_section/>
  outside3

Here are the questions:

Multiple regexes

Is there a better way to match two different regexes rather than nesting them inside another like what was done above?

  • What if they're not easily nestable like this?
  • Can I have XSL templates to match and transform regex matches in a text()?
    • In this case, I'd have two templates, one for each regex
    • If possible, this would be the ideal solution

Opening and closing elements on regex matches

Obviously, instead of:

<section/>
   inside
<_section/>

What I really want eventually is:

<section>
   inside
</section>

So how would you do this? I'm not sure if it's even possible to open an element in one regex match and close it in another (i.e. What if there is no match for the closer? The result will not be well-formed XML!), but it seems like this task is quite typical that there has to be an idiomatic solution for them.

Note: we can assume that sections will not overlap, and thus also will not nest. We can also assume that they will always appear in proper pairs.


Additional info

So essentially I'm trying to accomplish what in Perl would succintly be something like:

s/=+BEGIN=+/<section>/
s/=+END=+/<\/section>/

I'm looking for a way to do this in XSLT instead, because:

  • It'd be more robust with regards to the context of the regex match
    • (i.e. it should only transform text() nodes)
  • It'd also be more robust with regards to matching various XML entities

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

当爱已成负担 2024-09-14 15:42:56

此转换

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 exclude-result-prefixes="xs"
>
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()">
   <xsl:analyze-string select="." flags="mx"
    regex="=+BEGIN=+((.|\n)*?)=+END=+">

   <xsl:matching-substring>
    <section><xsl:value-of select="regex-group(1)"/></section>
   </xsl:matching-substring>

   <xsl:non-matching-substring>
    <xsl:value-of select="."/>
   </xsl:non-matching-substring>
 </xsl:analyze-string>
 </xsl:template>
</xsl:stylesheet>

应用于提供的 XML 文档时

<main>
  outside1
  ===BEGIN===
    inside1
  ====END====
  outside2
  =BEGIN=
    inside2
  ==END==
  outside3
</main>

产生所需的结果

<main>
  outside1
  <section>
    inside1
  </section>
  outside2
  <section>
    inside2
  </section>
  outside3
</main>

This transformation:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 exclude-result-prefixes="xs"
>
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()">
   <xsl:analyze-string select="." flags="mx"
    regex="=+BEGIN=+((.|\n)*?)=+END=+">

   <xsl:matching-substring>
    <section><xsl:value-of select="regex-group(1)"/></section>
   </xsl:matching-substring>

   <xsl:non-matching-substring>
    <xsl:value-of select="."/>
   </xsl:non-matching-substring>
 </xsl:analyze-string>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<main>
  outside1
  ===BEGIN===
    inside1
  ====END====
  outside2
  =BEGIN=
    inside2
  ==END==
  outside3
</main>

produces the wanted result:

<main>
  outside1
  <section>
    inside1
  </section>
  outside2
  <section>
    inside2
  </section>
  outside3
</main>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文