实现简单正则表达式的建议（用于 bbcode/geshi 解析）

发布于 2024-10-03 08:16:43 字数 632 浏览 15 评论 0原文

我用 PHP 制作了一个个人笔记软件，这样我就可以存储和组织我的笔记，并希望有一个简单的格式来编写它们。

我在 Markdown 中完成了它，但发现它有点令人困惑，并且没有简单的语法突出显示，所以我之前做过bbcode并希望实现它。

现在，对于我真正希望实现的 GeSHi（语法荧光笔），它需要最简单的代码，如下所示：

$geshi = new GeSHi($sourcecode, $language);
$geshi->parse_code();

现在这是简单的部分，但我想做的是允许我的 bbcode 调用它。

我当前匹配编写的 [syntax=cpp][/syntax] bbcode 的正则表达式如下：

preg_replace('#\[syntax=(.*?)\](.*?)\[/syntax\]#si' , 'geshi(\\2,\\1)????', text);

您会注意到我捕获了语言和内容，我到底如何将其连接到 GeSHi 代码？

preg_replace 似乎只能用字符串而不是“表达式”替换它，我不确定如何使用 GeSHi 的这两行代码与捕获的数据一起使用。

我对这个项目感到非常兴奋，并希望克服这个。

原文

I had made a personal note software in PHP so I can store and organize my notes and wished for a nice simple format to write them in.

I had done it in Markdown but found it was a little confusing and there was no simple syntax highlighting, so I did bbcode before and wished to implement that.

Now for GeSHi which I really wish to implement (the syntax highlighter), it requires the most simple code like this:

$geshi = new GeSHi($sourcecode, $language);
$geshi->parse_code();

Now this is the easy part , but what I wish to do is allow my bbcode to call it.

My current regular expression to match a made up [syntax=cpp][/syntax] bbcode is the following:

preg_replace('#\[syntax=(.*?)\](.*?)\[/syntax\]#si' , 'geshi(\\2,\\1)????', text);

You will notice I capture the language and the content, how on earth would I connect it to the GeSHi code?

preg_replace seems to just be able to replace it with a string not an 'expression', I am not sure how to use those two lines of code for GeSHi up there with the captured data..

I really am excited about this project and wish to overcome this.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烂柯人 2024-10-10 08:16:43

我不久前编写了这个类，该类的原因是为了允许轻松定制/解析。也许有点矫枉过正，但效果很好，我需要它对我的应用程序矫枉过正。用法非常简单：

$geshiH = new Geshi_Helper();
$text = $geshiH->geshi($text); // this assumes that the text should be parsed (ie inline syntaxes)

---- OR ----

$geshiH = new Geshi_Helper();
$text = $geshiH->geshi($text, $lang);  // assumes that you have the language, good for a snippets deal

我必须从我拥有的其他自定义项目中进行一些剪切，但等待剪切中没有语法错误，它应该可以工作。请随意使用它。

<?php

require_once 'Geshi/geshi.php';

class Geshi_Helper 
{
    /**
     * @var array Array of matches from the code block.
     */
    private $_codeMatches = array();

    private $_token = "";

    private $_count = 1;

    public function __construct()
    {
        /* Generate a unique hash token for replacement) */
        $this->_token = md5(time() . rand(9999,9999999));
    }

    /**
     * Performs syntax highlights using geshi library to the content.
     *
     * @param string $content - The context to parse
     * @return string Syntax Highlighted content
     */
    public function geshi($content, $lang=null)
    {
        if (!is_null($lang)) {
            /* Given the returned results 0 is not set, adding the "" should make this compatible */
            $content = $this->_highlightSyntax(array("", strtolower($lang), $content));
        }else {
            /* Need to replace this prior to the code replace for nobbc */
            $content = preg_replace('~\[nobbc\](.+?)\[/nobbc\]~ie', '\'[nobbc]\' . strtr(\'$1\', array(\'[\' => \'[\', \']\' => \']\', \':\' => \':\', \'@\' => \'@\')) . \'[/nobbc]\'', $content);

            /* For multiple content we have to handle the br's, hence the replacement filters */
            $content = $this->_preFilter($content);

            /* Reverse the nobbc markup */
            $content = preg_replace('~\[nobbc\](.+?)\[/nobbc\]~ie', 'strtr(\'$1\', array(\'&#91;\' => \'[\', \'&#93;\' => \']\', \'&#58;\' => \':\', \'&#64;\' => \'@\'))', $content);

            $content = $this->_postFilter($content);
        }

        return $content;
    }

    /**
     * Performs syntax highlights using geshi library to the content.
     * If it is unknown the number of blocks, use highlightContent
     * instead.
     *
     * @param string $content - The code block to parse
     * @param string $language - The language to highlight with
     * @return string Syntax Highlighted content
     * @todo Add any extra / customization styling here.
     */
    private function _highlightSyntax($contentArray)
    {
        $codeCount = $contentArray[1];

        /* If the count is 2 we are working with the filter */
        if (count($contentArray) == 2) {
            $contentArray = $this->_codeMatches[$contentArray[1]];
        }

        /* for default [syntax] */
        if ($contentArray[1] == "")
            $contentArray[1] = "php";

        /* Grab the language */
        $language = (isset($contentArray[1]))?$contentArray[1]:'text';

        /* Remove leading spaces to avoid problems */
        $content = ltrim($contentArray[2]);

        /* Parse the code to be highlighted */
        $geshi = new GeSHi($content, strtolower($language));
        return $geshi->parse_code();
    }

    /**
     * Substitute the code blocks for formatting to be done without
     * messing up the code.
     *
     * @param array $match - Referenced array of items to substitute
     * @return string Substituted content
     */
    private function _substitute(&$match)
    {
        $index = sprintf("%02d", $this->_count++);
        $this->_codeMatches[$index] = $match;
        return "----" . $this->_token . $index . "----";
    }

    /**
     * Removes the code from the rest of the content to apply other filters.
     *
     * @param string $content - The content to filter out the code lines
     * @return string Content with code removed.
     */
    private function _preFilter($content)
    {
        return preg_replace_callback("#\s*\[syntax=(.*?)\](.*?)\[/syntax\]\s*#siU", array($this, "_substitute"), $content);
    }

    /**
     * Replaces the code after the filters have been ran.
     *
     * @param string $content - The content to replace the code lines
     * @return string Content with code re-applied.
     */
    private function _postFilter($content)
    {
        /* using dashes to prevent the old filtered tag being escaped */
        return preg_replace_callback("/----\s*" . $this->_token . "(\d{2})\s*----/si", array($this, "_highlightSyntax"), $content);
    }
}
?>

I wrote this class a while back, the reason for the class was to allow easy customization / parsing. Maybe a little overkill, but works well and I needed it overkill for my application. The usage is pretty simple:

$geshiH = new Geshi_Helper();
$text = $geshiH->geshi($text); // this assumes that the text should be parsed (ie inline syntaxes)

---- OR ----

$geshiH = new Geshi_Helper();
$text = $geshiH->geshi($text, $lang);  // assumes that you have the language, good for a snippets deal

I had to do some chopping from other custom items I had, but pending no syntax errors from the chopping it should work. Feel free to use it.

<?php

require_once 'Geshi/geshi.php';

class Geshi_Helper 
{
    /**
     * @var array Array of matches from the code block.
     */
    private $_codeMatches = array();

    private $_token = "";

    private $_count = 1;

    public function __construct()
    {
        /* Generate a unique hash token for replacement) */
        $this->_token = md5(time() . rand(9999,9999999));
    }

    /**
     * Performs syntax highlights using geshi library to the content.
     *
     * @param string $content - The context to parse
     * @return string Syntax Highlighted content
     */
    public function geshi($content, $lang=null)
    {
        if (!is_null($lang)) {
            /* Given the returned results 0 is not set, adding the "" should make this compatible */
            $content = $this->_highlightSyntax(array("", strtolower($lang), $content));
        }else {
            /* Need to replace this prior to the code replace for nobbc */
            $content = preg_replace('~\[nobbc\](.+?)\[/nobbc\]~ie', '\'[nobbc]\' . strtr(\'$1\', array(\'[\' => \'[\', \']\' => \']\', \':\' => \':\', \'@\' => \'@\')) . \'[/nobbc]\'', $content);

            /* For multiple content we have to handle the br's, hence the replacement filters */
            $content = $this->_preFilter($content);

            /* Reverse the nobbc markup */
            $content = preg_replace('~\[nobbc\](.+?)\[/nobbc\]~ie', 'strtr(\'$1\', array(\'&#91;\' => \'[\', \'&#93;\' => \']\', \'&#58;\' => \':\', \'&#64;\' => \'@\'))', $content);

            $content = $this->_postFilter($content);
        }

        return $content;
    }

    /**
     * Performs syntax highlights using geshi library to the content.
     * If it is unknown the number of blocks, use highlightContent
     * instead.
     *
     * @param string $content - The code block to parse
     * @param string $language - The language to highlight with
     * @return string Syntax Highlighted content
     * @todo Add any extra / customization styling here.
     */
    private function _highlightSyntax($contentArray)
    {
        $codeCount = $contentArray[1];

        /* If the count is 2 we are working with the filter */
        if (count($contentArray) == 2) {
            $contentArray = $this->_codeMatches[$contentArray[1]];
        }

        /* for default [syntax] */
        if ($contentArray[1] == "")
            $contentArray[1] = "php";

        /* Grab the language */
        $language = (isset($contentArray[1]))?$contentArray[1]:'text';

        /* Remove leading spaces to avoid problems */
        $content = ltrim($contentArray[2]);

        /* Parse the code to be highlighted */
        $geshi = new GeSHi($content, strtolower($language));
        return $geshi->parse_code();
    }

    /**
     * Substitute the code blocks for formatting to be done without
     * messing up the code.
     *
     * @param array $match - Referenced array of items to substitute
     * @return string Substituted content
     */
    private function _substitute(&$match)
    {
        $index = sprintf("%02d", $this->_count++);
        $this->_codeMatches[$index] = $match;
        return "----" . $this->_token . $index . "----";
    }

    /**
     * Removes the code from the rest of the content to apply other filters.
     *
     * @param string $content - The content to filter out the code lines
     * @return string Content with code removed.
     */
    private function _preFilter($content)
    {
        return preg_replace_callback("#\s*\[syntax=(.*?)\](.*?)\[/syntax\]\s*#siU", array($this, "_substitute"), $content);
    }

    /**
     * Replaces the code after the filters have been ran.
     *
     * @param string $content - The content to replace the code lines
     * @return string Content with code re-applied.
     */
    private function _postFilter($content)
    {
        /* using dashes to prevent the old filtered tag being escaped */
        return preg_replace_callback("/----\s*" . $this->_token . "(\d{2})\s*----/si", array($this, "_highlightSyntax"), $content);
    }
}
?>

回复收藏 0 原文

请持续率性 2024-10-10 08:16:43

在我看来，你的正则表达式已经正确了。您的问题在于调用，因此我建议制作一个包装函数：

function geshi($src, $l) {
    $geshi = new GeSHi($sourcecode, $language);
    $geshi->parse_code();
    return $geshi->how_do_I_get_the_results();
}

现在这通常就足够了，但源代码本身可能包含单引号或双引号。因此，您不能根据需要编写 preg_replace(".../e", "geshi('$2','$1')", ...) 。（请注意，'$1' 和 '$2' 需要引号，因为 preg_replace 只是替换 $1,$2 占位符，但这需要是有效的 php 内联代码）。

这就是为什么您需要使用 preg_replace_callback 来避免 /e exec 替换代码中的转义问题。
例如：

preg_replace_callback('#\[syntax=(.*?)\](.*?)\[/syntax\]#si' , 'geshi_replace', $text);

我会制作第二个包装器，但您可以将其与原始代码结合起来：

function geshi_replace($uu) {
    return geshi($uu[2], $uu[1]);
}

It looks to me like you already got the regex right. Your problem lies in the invocation, so I suggest making a wrapper function:

function geshi($src, $l) {
    $geshi = new GeSHi($sourcecode, $language);
    $geshi->parse_code();
    return $geshi->how_do_I_get_the_results();
}

Now this would normally suffice, but the source code is likely to contain single or dobule quotes itself. Therefore you cannot write preg_replace(".../e", "geshi('$2','$1')", ...) as you would need. (Note that '$1' and '$2' need quotes because preg_replace just substitutes the $1,$2 placeholders, but this needs to be valid php inline code).

That's why you need to use preg_replace_callback to avoid escaping issues in the /e exec replacement code.
So for example:

preg_replace_callback('#\[syntax=(.*?)\](.*?)\[/syntax\]#si' , 'geshi_replace', $text);

And I'd make a second wrapper, but you can combine it with the original code:

function geshi_replace($uu) {
    return geshi($uu[2], $uu[1]);
}

回复收藏 0 原文

夕色琉璃 2024-10-10 08:16:43

使用 preg_match：

$match = preg_match('#\[syntax=(.*?)\](.*?)\[/syntax\]#si', $text);
$geshi = new GeSHi($match[2], $match[1]);

Use preg_match:

$match = preg_match('#\[syntax=(.*?)\](.*?)\[/syntax\]#si', $text);
$geshi = new GeSHi($match[2], $match[1]);

回复收藏 0 原文

~没有更多了~

关于作者

画▽骨i

暂无简介

文章

30 人气

关注发私信

友情链接

文江博客

实现简单正则表达式的建议（用于 bbcode/geshi 解析）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

5040234068

樱花雨梦

≈。彩虹

雨轻弹

血之狂魔

qq_0bIjwE

友情链接

实现简单正则表达式的建议（用于 bbcode/geshi 解析）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

5040234068

樱花雨梦

≈。彩虹

雨轻弹

血之狂魔

qq_0bIjwE

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。