strtok() 的问题

发布于 2024-12-09 01:45:19 字数 7936 浏览 6 评论 0原文

我已经为此苦苦挣扎了一段时间。我知道有很多代码需要查看，但我不知道问题出在哪里，并且似乎无法缩小范围。我会赏金它。

我写这个类是为了解析 bbcodes。它主要使用 strtok() ，并且该类工作得很好，除非您将两个标签紧挨着放置，而我一生都无法弄清楚为什么。

例如，[b] [i]test1[/i] [/b] 结果为 测试1。然而，[b][i]test1[/i][/b] 结果为 i]test1/b]。最后一个标记仅在那里，因为解析器会自动关闭它在字符串中找不到结束标记的标记。它以某种方式完全错过了 [i] 和 [/b] 标签。

这是该类及其用于设置各种 bbcode 的一个子类。子类基本上只是一个没有行为的数据结构。

<?php
    // beware images can contain any url/any get request. beware of csrf
    class Lev_TextProcessor_Extension_BbCode {

        protected $elements = array();
        protected $openTags = array();

        public function __construct() {
            $this->elements['b'] = new Lev_TextProcessor_Extension_BbCode_Element('<strong>', '</strong>');
            $this->elements['i'] = new Lev_TextProcessor_Extension_BbCode_Element('<em>', '</em>');
            $this->elements['u'] = new Lev_TextProcessor_Extension_BbCode_Element('<span style="text-decoration: underline;">', '</span>');
            $this->elements['s'] = new Lev_TextProcessor_Extension_BbCode_Element('<span style="text-decoration: line-through;">', '</span>');
            $this->elements['size'] = new Lev_TextProcessor_Extension_BbCode_Element('<span style="font-size: ', '</span>', 'px;">');
            $this->elements['color'] = new Lev_TextProcessor_Extension_BbCode_Element('<span style="color: ', '</span>', ';">');
            $this->elements['center'] = new Lev_TextProcessor_Extension_BbCode_Element('<div style="text-align: center;">', '</div>', '', true, true, false);
            $this->elements['url'] = new Lev_TextProcessor_Extension_BbCode_Element('<a href="', '</a>', '">');
            $this->elements['email'] = new Lev_TextProcessor_Extension_BbCode_Element('<a href="mailto:', '</a>', '">');
            $this->elements['img'] = new Lev_TextProcessor_Extension_BbCode_Element('<img src="', '" alt="" />', '', false, false, true);
            $this->elements['youtube'] = new Lev_TextProcessor_Extension_BbCode_Element('<object width="400" height="325"><param name="movie" value="http://www.youtube.com/v/{param}"></param><embed src="http://www.youtube.com/v/', '" type="application/x-shockwave-flash" width="400" height="325"></embed></object>', '', false, false, false);
            $this->elements['code'] = new Lev_TextProcessor_Extension_BbCode_Element('<pre><code>', '</code></pre>', '', true, false, false);
        }

        public function processText($input) {
            // pre processing
            $input = htmlspecialchars($input, ENT_NOQUOTES);
            $input = nl2br($input);
            $input = str_replace(array("\n", "\r"), '', $input);
            // start main processing
            $output = '';
            $allow_child_tags = true;
            $allow_child_quotes = true;

            $string_segment = strtok($input, '[');

            do {
                // check content for quotes
                if ($allow_child_quotes === false) {
                    if (strpos($string_segment, '"') === false) {
                        $output .= $string_segment;
                    }
                } else {
                    // add content to output
                    $output .= $string_segment;
                }

                $tag_contents = strtok(']');

                if (strpos($tag_contents, '/') === 0) {
                    // closing tag
                    $tag = substr($tag_contents, 1);
                    if (isset($this->elements[$tag]) === true && array_search($tag, $this->openTags) !== false) {
                        // tag found
                        do {
                            // close tags till matching tag found
                            $last_open_tag = array_pop($this->openTags);
                            $output .= $this->elements[$last_open_tag]->htmlAfter;
                        } while ($last_open_tag !== $tag);
                        $allow_child_tags = true;
                        $allow_child_quotes = true;
                    }
                } else {
                    // opening tag
                    // separate tag name from argument if there is one
                    $equal_pos = strpos($tag_contents, '=');
                    if ($equal_pos === false) {
                        $tag_name = $tag_contents;
                    } else {
                        $tag_name = substr($tag_contents, 0, $equal_pos);
                        $tag_argument = substr($tag_contents, $equal_pos + 1);
                    }
                    if (isset($this->elements[$tag_name]) === true) {
                        // tag found
                        if (($this->elements[$tag_name]->allowParentTags === true || count($this->openTags) === 0) && $allow_child_tags === true) {
                            // add tag to open tag list and set flags
                            $this->openTags[] = $tag_name;
                            $allow_child_tags = $this->elements[$tag_name]->allowChildTags;
                            $allow_child_quotes = $this->elements[$tag_name]->allowChildQuotes;
                            $output .= $this->elements[$tag_name]->htmlBefore;
                            // if argument exists
                            if ($equal_pos !== false) {
                                if (strpos($tag_argument, '"') === false) {
                                    $output .= $tag_argument;
                                }
                                $output .= $this->elements[$tag_name]->htmlCenter;
                            }
                        }
                    }
                }

                $string_segment = strtok('[');
            } while ($string_segment !== false);
            // close left over tags
            while ($tag = array_pop($this->openTags)) {
                $output .= $this->elements[$tag]->htmlAfter;
            }
            return $output;
        }
    }
?>

<?php

    class Lev_TextProcessor_Extension_BbCode_Element {

        public $htmlBefore;
        public $htmlAfter;
        public $htmlCenter;
        public $allowChildQuotes;
        public $allowChildTags;
        public $allowParentTags;

        public function __construct($html_before, $html_after, $html_center = '', $allow_child_quotes = true, $allow_child_tags = true, $allow_parent_tags = true) {
            if ($allow_child_quotes === false && $allow_child_tags === true) throw new Lev_TextProcessor_Exception('You may not allow child tags if you do not allow child quotes.');
            $this->htmlBefore = $html_before;
            $this->htmlAfter = $html_after;
            $this->htmlCenter = $html_center;
            $this->allowChildQuotes = $allow_child_quotes;
            $this->allowChildTags = $allow_child_tags;
            $this->allowParentTags = $allow_parent_tags;
        }
    }
?>

编辑

通过创建以下用于标记化的类来修复。

<?php

    // unlike PHP's strtok() function, this class will not skip over empty tokens.
    class Lev_TextProcessor_Tokenizer {

        protected $string;

        public function __construct($string) {
            $this->string = $string;
        }

        public function getToken($token) {
            $segment_length = strcspn($this->string, $token);
            $token = substr($this->string, 0, $segment_length);
            $this->string = substr($this->string, $segment_length + 1);
            return $token;
        }
    }
?>

原文

I have been wrestling with this for a while. I know it's a lot of code to look at, but I have no idea where the problem lies and can't seem to narrow it down. I will bounty it.

I wrote this class to parse bbcodes. It uses strtok() primarily, and the class works great unless you put two tags right next to each other, and I can't for the life of me figure out why.

For instance [b] [i]test1[/i] [/b] results in  test1 .
Yet [b][i]test1[/i][/b] results in i]test1/b].
The last  tag is only in there because the parser automatically closes tags it could not find a closing tag for in the string. It somehow misses the [i] and [/b] tags completely.

Here's the class as well as the one subclass it uses for setting up the various bbcodes. The subclass is basically just a data structure with no behaviours.

<?php
    // beware images can contain any url/any get request. beware of csrf
    class Lev_TextProcessor_Extension_BbCode {

        protected $elements = array();
        protected $openTags = array();

        public function __construct() {
            $this->elements['b'] = new Lev_TextProcessor_Extension_BbCode_Element('<strong>', '</strong>');
            $this->elements['i'] = new Lev_TextProcessor_Extension_BbCode_Element('<em>', '</em>');
            $this->elements['u'] = new Lev_TextProcessor_Extension_BbCode_Element('<span style="text-decoration: underline;">', '</span>');
            $this->elements['s'] = new Lev_TextProcessor_Extension_BbCode_Element('<span style="text-decoration: line-through;">', '</span>');
            $this->elements['size'] = new Lev_TextProcessor_Extension_BbCode_Element('<span style="font-size: ', '</span>', 'px;">');
            $this->elements['color'] = new Lev_TextProcessor_Extension_BbCode_Element('<span style="color: ', '</span>', ';">');
            $this->elements['center'] = new Lev_TextProcessor_Extension_BbCode_Element('<div style="text-align: center;">', '</div>', '', true, true, false);
            $this->elements['url'] = new Lev_TextProcessor_Extension_BbCode_Element('<a href="', '</a>', '">');
            $this->elements['email'] = new Lev_TextProcessor_Extension_BbCode_Element('<a href="mailto:', '</a>', '">');
            $this->elements['img'] = new Lev_TextProcessor_Extension_BbCode_Element('<img src="', '" alt="" />', '', false, false, true);
            $this->elements['youtube'] = new Lev_TextProcessor_Extension_BbCode_Element('<object width="400" height="325"><param name="movie" value="http://www.youtube.com/v/{param}"></param><embed src="http://www.youtube.com/v/', '" type="application/x-shockwave-flash" width="400" height="325"></embed></object>', '', false, false, false);
            $this->elements['code'] = new Lev_TextProcessor_Extension_BbCode_Element('<pre><code>', '</code></pre>', '', true, false, false);
        }

        public function processText($input) {
            // pre processing
            $input = htmlspecialchars($input, ENT_NOQUOTES);
            $input = nl2br($input);
            $input = str_replace(array("\n", "\r"), '', $input);
            // start main processing
            $output = '';
            $allow_child_tags = true;
            $allow_child_quotes = true;

            $string_segment = strtok($input, '[');

            do {
                // check content for quotes
                if ($allow_child_quotes === false) {
                    if (strpos($string_segment, '"') === false) {
                        $output .= $string_segment;
                    }
                } else {
                    // add content to output
                    $output .= $string_segment;
                }

                $tag_contents = strtok(']');

                if (strpos($tag_contents, '/') === 0) {
                    // closing tag
                    $tag = substr($tag_contents, 1);
                    if (isset($this->elements[$tag]) === true && array_search($tag, $this->openTags) !== false) {
                        // tag found
                        do {
                            // close tags till matching tag found
                            $last_open_tag = array_pop($this->openTags);
                            $output .= $this->elements[$last_open_tag]->htmlAfter;
                        } while ($last_open_tag !== $tag);
                        $allow_child_tags = true;
                        $allow_child_quotes = true;
                    }
                } else {
                    // opening tag
                    // separate tag name from argument if there is one
                    $equal_pos = strpos($tag_contents, '=');
                    if ($equal_pos === false) {
                        $tag_name = $tag_contents;
                    } else {
                        $tag_name = substr($tag_contents, 0, $equal_pos);
                        $tag_argument = substr($tag_contents, $equal_pos + 1);
                    }
                    if (isset($this->elements[$tag_name]) === true) {
                        // tag found
                        if (($this->elements[$tag_name]->allowParentTags === true || count($this->openTags) === 0) && $allow_child_tags === true) {
                            // add tag to open tag list and set flags
                            $this->openTags[] = $tag_name;
                            $allow_child_tags = $this->elements[$tag_name]->allowChildTags;
                            $allow_child_quotes = $this->elements[$tag_name]->allowChildQuotes;
                            $output .= $this->elements[$tag_name]->htmlBefore;
                            // if argument exists
                            if ($equal_pos !== false) {
                                if (strpos($tag_argument, '"') === false) {
                                    $output .= $tag_argument;
                                }
                                $output .= $this->elements[$tag_name]->htmlCenter;
                            }
                        }
                    }
                }

                $string_segment = strtok('[');
            } while ($string_segment !== false);
            // close left over tags
            while ($tag = array_pop($this->openTags)) {
                $output .= $this->elements[$tag]->htmlAfter;
            }
            return $output;
        }
    }
?>

<?php

    class Lev_TextProcessor_Extension_BbCode_Element {

        public $htmlBefore;
        public $htmlAfter;
        public $htmlCenter;
        public $allowChildQuotes;
        public $allowChildTags;
        public $allowParentTags;

        public function __construct($html_before, $html_after, $html_center = '', $allow_child_quotes = true, $allow_child_tags = true, $allow_parent_tags = true) {
            if ($allow_child_quotes === false && $allow_child_tags === true) throw new Lev_TextProcessor_Exception('You may not allow child tags if you do not allow child quotes.');
            $this->htmlBefore = $html_before;
            $this->htmlAfter = $html_after;
            $this->htmlCenter = $html_center;
            $this->allowChildQuotes = $allow_child_quotes;
            $this->allowChildTags = $allow_child_tags;
            $this->allowParentTags = $allow_parent_tags;
        }
    }
?>

edit

Fixed by creating the following class for tokenizing.

<?php

    // unlike PHP's strtok() function, this class will not skip over empty tokens.
    class Lev_TextProcessor_Tokenizer {

        protected $string;

        public function __construct($string) {
            $this->string = $string;
        }

        public function getToken($token) {
            $segment_length = strcspn($this->string, $token);
            $token = substr($this->string, 0, $segment_length);
            $this->string = substr($this->string, $segment_length + 1);
            return $token;
        }
    }
?>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夢归不見 2024-12-16 01:45:19

尽管我不认为这确实是解决方案，但这似乎是我表达观点的唯一方法。

它可能与 strtok() 的工作方式有关，以获得您想要的结果。

虽然不完美，但我能够获得接近您所期望的结果：

 <?
 $data1 = strtok('[b][i]test1[/i][/b]','[');
 $data2 = strtok(']');
 $data3 = strtok('[');
 $data4 = strtok(']');
 $data5 = strtok('[');
 $data6 = strtok(']');
 var_dump($data1, $data2,$data3, $data4, $data5, $data6);
 /*
  OUTPUT
    string(2) "b]"
    string(1) "i"
    string(5) "test1"
    string(2) "/i"
    string(3) "/b]"
    bool(false)
 * /
 ?>

正如我所说，它并不完美，但也许看到这一点将帮助您处理此解决方案。我个人从未使用这种类型解析来处理 BBCode，而是使用 preg_match()。

Although I don't think this is really the solution it seems this is the only way I'm going to get my point across.

It could be something with the way strtok() works that to get the results you want.

Although not perfect I was able to obtain results close to what you were expecting with this:

 <?
 $data1 = strtok('[b][i]test1[/i][/b]','[');
 $data2 = strtok(']');
 $data3 = strtok('[');
 $data4 = strtok(']');
 $data5 = strtok('[');
 $data6 = strtok(']');
 var_dump($data1, $data2,$data3, $data4, $data5, $data6);
 /*
  OUTPUT
    string(2) "b]"
    string(1) "i"
    string(5) "test1"
    string(2) "/i"
    string(3) "/b]"
    bool(false)
 * /
 ?>

As I said it isn't perfect but maybe seeing this will help you on your way to handling this solution. I personally have never handled BBCode with this type parsing instead using preg_match().

回复收藏 0 原文

~没有更多了~