PHP 正则表达式...获取结束括号的第一个实例?
你好,我正在尝试解析一些我想出的自制 bbcode,但遇到了一些困难。我是正则表达式的新手,但认为这将是自学的好方法。
[%url=http://google.com]google 链接[/url%]
<a href='google.com' google link </a>
[%video=http://youtube.com?v=blah]
i will run the link through a automatic embed function
developed in php..i just need to parse the link
< strong>[%PAGEBREAK%]
<hr>
[%img=wateva.jpg%]
<img src='wateva.jpg'>
到目前为止,我已经完成了效果很好的网址...见下文
$url_pattern = "/\[\s*%\s*(URL|url)\s*=\s*(.*)\](.*)\[\s*\/\s*(URL|url)\s*%\s*\]/i";
$description = preg_replace($url_pattern, "<a href='$2'>$3</a>", $description);
但是当我尝试执行以下操作时图像...(见下文)
$img_pattern ="/\[\s*%\s*(IMG|img)=(.*)\s*(%\s*\])/i";
$description = preg_replace($img_pattern, "<img src=\'$2\' style='width: 700px; height: auto; display:block;\'>", $description);
它拾取最后一个“%]”整个文本而不是最接近的“%]”..我如何告诉它找到最接近的%]?
这是我的测试文本:
*最少 100 个字。结合图像、视频和/或链接描述您的项目……只是不要写小说!使用下面的图像部分来使用与您的文本相对应的图像。描述栏中的图标将允许您添加其他媒体,例如链接和视频。最少 100 字。结合图像、视频和/或链接描述您的项目……只是不要写小说!使用下面的图像部分来使用与您的文本相对应的图像。描述栏中的图标将允许您添加其他媒体,例如链接和视频。最少 100 字。结合图像、视频和/或链接对您的项目进行描述。
[%PAGEBREAK%]
[%IMG=uploads/06-26-11/Cog.gif%]
只是不要写小说!使用下面的图像部分来使用与您的文本相对应的图像。描述栏中的图标将允许您添加其他媒体,例如链接和视频。最少 100 字。结合图像、视频和/或链接描述您的项目……只是不要写小说!通过使用下面的图像部分来使用与您的文本相对应的图像。描述栏中的图标将允许您添加其他媒体,例如链接和视频。
这是 [%URL=http://google.com] 链接[/URL%]
这是帮派舞蹈的视频
[%VIDEO=http://www.youtube.com/watch?v=lZMFwKVjV5s%] *
Hi i am trying to parse some homemade bbcode i came up with and having a difficult time with something. I am new to regex but thought this would be a great way to teach myself.
[%url=http://google.com]google link[/url%]
<a href='google.com' google link </a>
[%video=http://youtube.com?v=blah]
i will run the link through a automatic embed function
developed in php..i just need to parse the link
[%PAGEBREAK%]
<hr>
[%img=wateva.jpg%]
<img src='wateva.jpg'>
So far I have done the url one which worked great...see below
$url_pattern = "/\[\s*%\s*(URL|url)\s*=\s*(.*)\](.*)\[\s*\/\s*(URL|url)\s*%\s*\]/i";
$description = preg_replace($url_pattern, "<a href='$2'>$3</a>", $description);
But when i tried to do the image...(see below)
$img_pattern ="/\[\s*%\s*(IMG|img)=(.*)\s*(%\s*\])/i";
$description = preg_replace($img_pattern, "<img src=\'$2\' style='width: 700px; height: auto; display:block;\'>", $description);
It picks up the last "%]" of the whole text instead of the closest "%]"..how do i tell it to find the closest %]?
Here is my testing TEXT:
*100 word minimum. Give a description of your project combined with images, video, and or links.. just don't write a novel! Use images that correspond with your text by using the images section below. The icons in the description bar will allow you to add other media like links and videos.100 word minimum. Give a description of your project combined with images, video, and or links.. just don't write a novel! Use images that correspond with your text by using the images section below. The icons in the description bar will allow you to add other media like links and videos.100 word minimum. Give a description of your project combined with images, video, and or links..
[%PAGEBREAK%]
[%IMG=uploads/06-26-11/Cog.gif%]
just don't write a novel! Use images that correspond with your text by using the images section below. The icons in the description bar will allow you to add other media like links and videos.100 word minimum. Give a description of your project combined with images, video, and or links.. just don't write a novel! Use images that correspond with your text by using the images section below. The icons in the description bar will allow you to add other media like links and videos.
This is a [%URL=http://google.com]link[/URL%]
Here is a video by gang gang dance
[%VIDEO=http://www.youtube.com/watch?v=lZMFwKVjV5s%]*
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题很可能是
/\[\s*%\s*(IMG|img)=(.*)\s*(%\s*\])/ 中的
.*
我。*
是贪婪的 - 它将匹配到文档的末尾,然后回溯到最后%]
来匹配它。通常情况下,除非您设置了/s
标志,否则问题将被隐藏,这会导致.
匹配换行符(也称为Dot-All 标志)。
一个简单的解决方案是使用惰性量词,因此
.*?
默认情况下不匹配任何内容,但随后 machtrack 匹配越来越多的字符,直到找到第一个%]
:更好的选择是定义
img
标记中可接受的字母表。例如,除]
或换行符之外的任何内容:另请参阅:懒惰而不是贪婪
您可能还想修复其他模式,它们有同样的问题。
最后,我建议查看现有 bbcode 解析器的实现。这些代码可以具有嵌套结构(例如,块引用中链接中的图像),这使得正确解析它们变得很棘手。
The problem is most likely
.*
in/\[\s*%\s*(IMG|img)=(.*)\s*(%\s*\])/i
.*
is greedy - it will match to the end of the document, and then backtrack to the last%]
to match it. Normally, the problem would be hidden unless you've set the/s
flag, which causes.
to match newlines (and also called theDot-All
flag).A simple solution is to use a lazy quantifier, so
.*?
matches nothing by default, but then machtrack to match more and more character until it finds the first%]
:A better option is to define what alphabet is acceptable in
img
tags. For example, anything other than a]
or a newline:See also: Laziness Instead of Greediness
You probably want to fix the other patterns as well, they share the same problem.
Finally, I'd advice to look at an implementation of an existing bbcode parser. These codes can have nested constructs (for example, an image in a link in a blockquote), making them tricky to parse correctly.