XSLT 用正则表达式替换文本中的 url

发布于 2024-11-08 23:35:26 字数 3666 浏览 0 评论 0 原文

我有一个来自 Twitter 的 xml 提要,我想使用 XSLT 对其进行转换。我希望 xslt 做的是替换 twitter 消息中出现的每个 URL。我已经使用 ​​this此主题位于 stackoverflow 上。我怎样才能实现这个目标?如果我使用下面的模板,我会遇到无限循环,但我不知道在哪里。一旦我注释掉对“replaceAll”模板的调用,一切似乎都有效,但当然,twitter 消息的内容不会被替换。我是 XSLT 的新手,因此欢迎提供任何帮助。

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
    <xsl:output method="text" omit-xml-declaration="yes" indent="yes"  encoding="utf-8" />
    <xsl:param name="html-content-type" />
    <xsl:variable name="urlRegex" select="8"/>
    <xsl:template match="statuses">
        <xsl:for-each select="//status[position() &lt; 2]">
            <xsl:variable name="TwitterMessage" select="text" />
            <xsl:call-template name="replaceAll">
                <xsl:with-param name="text" select="$TwitterMessage"/>
                <xsl:with-param name="replace" select="De"/> <!--This should become an regex to replace urls, maybe something like the rule below?-->
                <xsl:with-param name="by" select="FOOOO"/> <!--Here I want the matching regex value to be replaced with valid html to create an href-->
                <!--<xsl:value-of select="replace(text,'^http://(.*)\.com','#')"/>
                <xsl:value-of select="text"/>-->
            </xsl:call-template>
            <!--<xsl:value-of select="text"/>-->
            <!--<xsl:apply-templates />-->
        </xsl:for-each>
    </xsl:template>

    <xsl:template name="replaceAll">
        <xsl:param name="text"/>
        <xsl:param name="replace"/>
        <xsl:param name="by"/>
        <xsl:choose>
            <xsl:when test="contains($text,$replace)">
                <xsl:value-of select="substring-before($text,$replace)"/>
                <xsl:value-of select="$by"/>
                <xsl:call-template name="replaceAll">
                    <xsl:with-param name="text" select="substring-after($text,$replace)"/>
                    <xsl:with-param name="replace" select="$replace"/>
                    <xsl:with-param name="by" select="$by"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

编辑: 这是 xml feed 的示例。

<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
<status>
  <created_at>Mon May 16 14:17:12 +0000 2011</created_at>
  <id>10000000000000000</id>
  <text>This is an message from Twitter http://bit.ly/xxxxx http://yfrog.com/xxxxx</text>
<status>

这只是 url 上的基本 html twitter 输出,如下所示;

http://twitter.com/statuses/user_timeline.xml?screen_name=yourtwitterusername

此文本;

This is an message from Twitter http://bit.ly/xxxxx http://yfrog.com/xxxxx

应转换为;

This is an message from Twitter <a href="http://bit.ly/xxxxx>http://bit.ly/xxxxx</a> <a href="http://yfrog.com/xxxxx">http://yfrog.com/xxxxx</a>

I've got an xml feed coming from Twitter which I want to transform using XSLT. What I want the xslt to do is to replace every occuring URL in an twittermessage. I've already created the following xslt template using this and this topic here on stackoverflow. How can I achieve this? If I use the template as below i'm getting an infinite loop but I don't see where. As soon as I comment out the call to the 'replaceAll'-template everything seem to work, but then ofcourse no content of the twittermessage gets replaced. I'm new to XSLT so every bit of help is welcome.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
    <xsl:output method="text" omit-xml-declaration="yes" indent="yes"  encoding="utf-8" />
    <xsl:param name="html-content-type" />
    <xsl:variable name="urlRegex" select="8"/>
    <xsl:template match="statuses">
        <xsl:for-each select="//status[position() < 2]">
            <xsl:variable name="TwitterMessage" select="text" />
            <xsl:call-template name="replaceAll">
                <xsl:with-param name="text" select="$TwitterMessage"/>
                <xsl:with-param name="replace" select="De"/> <!--This should become an regex to replace urls, maybe something like the rule below?-->
                <xsl:with-param name="by" select="FOOOO"/> <!--Here I want the matching regex value to be replaced with valid html to create an href-->
                <!--<xsl:value-of select="replace(text,'^http://(.*)\.com','#')"/>
                <xsl:value-of select="text"/>-->
            </xsl:call-template>
            <!--<xsl:value-of select="text"/>-->
            <!--<xsl:apply-templates />-->
        </xsl:for-each>
    </xsl:template>

    <xsl:template name="replaceAll">
        <xsl:param name="text"/>
        <xsl:param name="replace"/>
        <xsl:param name="by"/>
        <xsl:choose>
            <xsl:when test="contains($text,$replace)">
                <xsl:value-of select="substring-before($text,$replace)"/>
                <xsl:value-of select="$by"/>
                <xsl:call-template name="replaceAll">
                    <xsl:with-param name="text" select="substring-after($text,$replace)"/>
                    <xsl:with-param name="replace" select="$replace"/>
                    <xsl:with-param name="by" select="$by"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

EDIT:
This in an example of the xml feed.

<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
<status>
  <created_at>Mon May 16 14:17:12 +0000 2011</created_at>
  <id>10000000000000000</id>
  <text>This is an message from Twitter http://bit.ly/xxxxx http://yfrog.com/xxxxx</text>
<status>

This is just the basic html twitter outputs on an url like below;

http://twitter.com/statuses/user_timeline.xml?screen_name=yourtwitterusername

This text;

This is an message from Twitter http://bit.ly/xxxxx http://yfrog.com/xxxxx

Should be converted to;

This is an message from Twitter <a href="http://bit.ly/xxxxx>http://bit.ly/xxxxx</a> <a href="http://yfrog.com/xxxxx">http://yfrog.com/xxxxx</a>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

゛清羽墨安 2024-11-15 23:35:26

所以,您的问题与 XSLT 无关。您想要的是找出在 XPath 中操作文本字符串的最佳选项。如果您使用独立的 XSLT 引擎,您可能可以使用 XPath 2,它几乎具有您所需的功能,尽管使用正则表达式会有点繁琐。如果您从支持 EXSLT 的引擎运行此程序,则需要查找那里有哪些可用函数。如果您从 PHP 运行此程序,则将文本操作移交给 PHP 代码通常非常好;为此,您可以创建一个 PHP 函数来执行您想要的操作,并使用 php:function('f-name', input ...) 作为 XPath 表达式从 XSLT 调用它。

就正则表达式而言,我猜您正在寻找大致如下的内容:

发送 (https?://.*?)(?=[.,:;)]*($|\s) )$1

如果它不匹配所有 URL,那也没关系,您只需处理传入数据以及 Twitter 的修改即可。检查末尾的标点符号(正则表达式中的 [])实际上是用户期望您做的唯一棘手的事情。

So, your question isn't about XSLT. What you want is to find out the best option for manipulating a text string in XPath. If you are using a standalone XSLT engine, you can probably use XPath 2, which just about has the power you need, though with regexs it will get a bit fiddly. If you are running this from an engine with EXSLT support, you will need to look up what functions are available there. If you are running this from PHP, text manipulation is generally very good to hand over to the PHP code; you do that by make a PHP function to do what you want, and call it from the XSLT using php:function('f-name', inputs ...) as the XPath expression.

As far as regexs go, I guess you are looking for something pretty much along these lines:

send (https?://.*?)(?=[.,:;)]*($|\s)) to <a href="$1">$1</a>.

If it doesn't match all URLs, that's fine, and you only need to handle incoming data as well as Twitter's munging. Checking for punctuation at the end (the [] in the regex) is really the only tricky thing that your users will expect you to do.

妄想挽回 2024-11-15 23:35:26

一般来说,我不会实现新的替换功能。我会使用 EXSLT 提供的那个。如果您的 XSLT 处理器支持 exslt,则只需按如下方式设置样式表:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:regex="http://exslt.org/regular-expressions"
                extension-element-prefixes="regex"
                version="1.0">

否则,从 EXSLT

对于全局替换,您可以使用该函数,如下所示:

<xsl:value-of select="regexp:replace(string($TwitterMessage), 'yourppatern', 'g', 'yourreplace')" />

很抱歉给出一般性答案,但我目前无法测试 XSLT。

Generally, I wouldnt implement a new replace function. I'd use the one provided by EXSLT. If your XSLT processor supports exslt, you just need to set the stylesheet as follows:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:regex="http://exslt.org/regular-expressions"
                extension-element-prefixes="regex"
                version="1.0">

Otherwise download and imort the stylesheet from EXSLT.

For a global replace you can use the function as follows:

<xsl:value-of select="regexp:replace(string($TwitterMessage), 'yourppatern', 'g', 'yourreplace')" />

Sorry for the general answer, but I'm not able to test XSLT at the moment.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文