在 PHP 中解析带有变量属性的 bbcode 引号
我一直尽力避免来这里问这个问题,坚持对自己说我可以自己解决这个问题。我已经这样做了,但我想我无论如何都会来这里 1)分享我的解决方案或 2)获得更好的解决方案。
我知道 stackoverflow 上已经有很多关于此问题的问题,大多数人都说使用 PEAR 库,但没有一个是关于我的具体问题的。
基本上我希望能够解析 bbcode 引用标签,但是该引用可以具有可变数量的属性或根本没有属性,因此简单的 preg_replace 不会以与下划线标签相同的方式工作。
一个字符串中也可以有多个引号标签,这是我如何解决它的一个示例。任何人都可以建议一种更好的方法来避免多个正则表达式和 foreach 循环吗?
(应该注意的是,我正在解析示例中的强标记,但我在代码中的其他地方执行此操作,这是我特别在此处苦苦挣扎并询问的引号)
$string = "[quote name='Rob' user_id='1' id='1' timestamp='1294120376']
My text here
[/quote]
[quote name='Rob' user_id='1' id='2' timestamp='1302442553']
Lorem ipsum dolor sit amet
[/quote]
Test Comment";
preg_match_all('/\[quote(.*?)](.*?)\[\/quote\]/msi', $string, $matches);
$quotes = array();
foreach($matches[1] as $id => $match)
{
preg_match_all('/(\w*?)=\'(.*?)\'/msi', $match, $attr_matches);
array_push($quotes, array(
'text' => trim($matches[2][$id]),
'attributes' => array_combine($attr_matches[1], $attr_matches[2])
));
}
echo '<pre>'.print_r($quotes,1).'</pre>';
这将输出以下内容:
Array
(
[0] => Array
(
[text] => My text here
[attributes] => Array
(
[name] => Rob
[user_id] => 1
[id] => 1
[timestamp] => 1294120376
)
)
[1] => Array
(
[text] => Lorem ipsum dolor sit amet
[attributes] => Array
(
[name] => Rob
[user_id] => 1
[id] => 2
[timestamp] => 1302442553
)
)
)
然后我只需构建HTML
$bbcode = '';
foreach($quotes as $quote)
{
$attributes = array();
foreach($quote['attributes'] as $key => $value)
{
switch($key)
{
case 'id':
$attributes[] = '<a href="'.site_url('forums/findpost/'.$value).'">Permalink</a>';
break;
case 'name':
if(isset($quote['attributes']['user_id']))
{
$attributes[] = 'By <a href="'.site_url('user/profile/'.$quote['attributes']['user_id'].'/'.$value).'">'.$value.'</a>';
}
else
{
$attributes[] = 'By '.$value;
}
break;
case 'timestamp':
$attributes[] = 'On '.date('d F Y - H:i A', $value);
break;
}
}
if(!empty($attributes))
{
$bbcode .= '<p class="citation">'.implode(' | ', $attributes).'</p>';
}
$bbcode .= '<blockquote>
'.$quote['text'].'
</blockquote>';
}
echo $bbcode;
将输出以下内容:
<p class="citation">By <a href="http://domain.com/user/profile/1/Rob.html">Rob</a> | <a href="http://domain.com/forums/findpost/1.html">Permalink</a> | On 04 January 2011 - 05:52 AM</p>
<blockquote>
My text here
</blockquote>
<p class="citation">By <a href="http://domain.com/user/profile/1/Rob.html">Rob</a> | <a href="http://domain.com/forums/findpost/2.html">Permalink</a> | On 10 April 2011 - 14:35 PM</p>
<blockquote>
Lorem ipsum dolor sit amet
</blockquote>
所以这似乎是一种非常漫长且迂回的方法,但我无法理解另一种方法。有人...?
I've been trying my best to avoid coming here and asking this, insisting to myself that I could solve it on my own. I have done that, but I thought i'd come here anyway to 1) share my solution or 2) get a better solution.
I know there are already a ton of stackoverflow questions on this, most say use the PEAR library and none are about my specific question.
Basically I want to be able to parse the bbcode quote tag, however this quote can have a variable number of attributes or no attributes at all so a simple preg_replace won't work in the same way as it would for say an underline tag.
There can also be multiple quote tags within one string, here's an example of how I solved it. Can anyone suggest a better way avoiding the multiple regex expressions and foreach loops?
(it should be noted i'm parsing the strong tag in the example, but I am doing this elsewhere in my code, its the quotes i'm specifically struggling with and asking about here)
$string = "[quote name='Rob' user_id='1' id='1' timestamp='1294120376']
My text here
[/quote]
[quote name='Rob' user_id='1' id='2' timestamp='1302442553']
Lorem ipsum dolor sit amet
[/quote]
Test Comment";
preg_match_all('/\[quote(.*?)](.*?)\[\/quote\]/msi', $string, $matches);
$quotes = array();
foreach($matches[1] as $id => $match)
{
preg_match_all('/(\w*?)=\'(.*?)\'/msi', $match, $attr_matches);
array_push($quotes, array(
'text' => trim($matches[2][$id]),
'attributes' => array_combine($attr_matches[1], $attr_matches[2])
));
}
echo '<pre>'.print_r($quotes,1).'</pre>';
This will output the following:
Array
(
[0] => Array
(
[text] => My text here
[attributes] => Array
(
[name] => Rob
[user_id] => 1
[id] => 1
[timestamp] => 1294120376
)
)
[1] => Array
(
[text] => Lorem ipsum dolor sit amet
[attributes] => Array
(
[name] => Rob
[user_id] => 1
[id] => 2
[timestamp] => 1302442553
)
)
)
Then I simply build the HTML
$bbcode = '';
foreach($quotes as $quote)
{
$attributes = array();
foreach($quote['attributes'] as $key => $value)
{
switch($key)
{
case 'id':
$attributes[] = '<a href="'.site_url('forums/findpost/'.$value).'">Permalink</a>';
break;
case 'name':
if(isset($quote['attributes']['user_id']))
{
$attributes[] = 'By <a href="'.site_url('user/profile/'.$quote['attributes']['user_id'].'/'.$value).'">'.$value.'</a>';
}
else
{
$attributes[] = 'By '.$value;
}
break;
case 'timestamp':
$attributes[] = 'On '.date('d F Y - H:i A', $value);
break;
}
}
if(!empty($attributes))
{
$bbcode .= '<p class="citation">'.implode(' | ', $attributes).'</p>';
}
$bbcode .= '<blockquote>
'.$quote['text'].'
</blockquote>';
}
echo $bbcode;
Which will output the following:
<p class="citation">By <a href="http://domain.com/user/profile/1/Rob.html">Rob</a> | <a href="http://domain.com/forums/findpost/1.html">Permalink</a> | On 04 January 2011 - 05:52 AM</p>
<blockquote>
My text here
</blockquote>
<p class="citation">By <a href="http://domain.com/user/profile/1/Rob.html">Rob</a> | <a href="http://domain.com/forums/findpost/2.html">Permalink</a> | On 10 April 2011 - 14:35 PM</p>
<blockquote>
Lorem ipsum dolor sit amet
</blockquote>
So this seems like a very long and round about way of doing it but I can't fathom another method. Anyone...?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我设法想出了自己的更优雅的解决方案,该解决方案既减少了代码,又可以使用嵌套引号。
这只会解析引号,引号内和周围的内容仍然需要从 bbcode 转换,但是有大量可用的资源。
这应该返回以下内容
I've managed to come up with my own more elegant solution that is both less code and will work with nested quotes.
This will only parse the quotes, the content within and around the quotes will still need to be converted from bbcode, but there are plenty of resources available for that.
This should return the following
对于使用正则表达式来处理 BBCode,它实际上非常理智......尽管您似乎在最后放弃了
[b]Test Comment[/b]
。正如评论中提到的,一旦标签变得可嵌套,此方法就会中断。
但是,由于您不认为嵌套是一个问题,因此这段代码应该可以很好地工作。不要忘记过滤标签中的属性以查找不友好的字符。如果
site_url
不这样做,那么您就创建了 XSS 漏洞。For using regexes to work with BBCode, it's actually pretty sane... though you seem to have discarded
[b]Test Comment[/b]
at the end.As alluded to in the comments, this method is going to break the instant your tags become nestable. I've previously written about that problem, and pretty much the only sane solution is going to be building a "real" parser to deal with that insanity. I have yet to encounter an existing third-party BBCode parser that does this correctly.
However, as you don't believe nesting is an issue, this code should work well enough. Don't forget to filter the attributes in the tags for unfriendly characters. If
site_url
doesn't do so, then you've created an XSS vulnerability.