将图像正则表达式转换为 BBCode 的问题

发布于 2024-10-11 10:40:55 字数 940 浏览 5 评论 0原文

我正在为自己的 phpBB3 做一些事情,我试图将这些微笑转换回原始的笑脸状态,例如,

:)  :(  :O  :P

由于笑脸的 HTML 包含以下内容:

/<img src=".*" alt="(.*)" title=".*">/gi

替换为:

$1

但是,当我有多个笑脸时,它只是显示最后一个笑脸,例如,如果是这样的:

alt text http://uimgz.com/i/ R2e3H8g5D8.png

它变成了这样:

:twisted:

这是右边最后一个笑脸,为什么它没有替换并返回所有应该返回的笑脸状态,如下所示:

:) :o :twisted:

正则表达式似乎很好,但我不'这似乎是问题所在,所有正则表达式都使用 for() 循环进行替换循环,因此这不是问题。

多个笑脸 HTML:

<img src="./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> <img src="./images/smilies/icon_e_surprised.gif" alt=":o" title="Surprised" /> <img src="./images/smilies/icon_twisted.gif" alt=":twisted:" title="Twisted Evil" />

I'm working on something for phpBB3 of my own, I'm trying to convert those smiles back to the oringinal smiley state, e.g.

:)  :(  :O  :P

Since the HTML of a smiley contains this:

/<img src=".*" alt="(.*)" title=".*">/gi

Replaced to:

$1

However, when I have multiple smileys, it just show the last smiley, e.g. if it was like this:

alt text http://uimgz.com/i/R2e3H8g5D8.png

It turns into this:

:twisted:

Which is the last smiley on the right, why hasn't it replaced and returned all of the smiley states which it should return like this:

:) :o :twisted:

The Regex seems fine, but I don't what seems to be the problem, all of the regex go through a replacement loop using a for() loop so that's not the problem.

Multiple smileys HTML:

<img src="./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> <img src="./images/smilies/icon_e_surprised.gif" alt=":o" title="Surprised" /> <img src="./images/smilies/icon_twisted.gif" alt=":twisted:" title="Twisted Evil" />

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

久隐师 2024-10-18 10:40:55

将正则表达式更改为此并重试:

/<img src="[^"]*" alt="([^"]+)" title="[^"]*">/gi

正则表达式引擎通常是贪婪的。它将尝试匹配与某些内容相匹配的最长文本。在你的情况下,我将所有三个链接匹配为一个。我在这里所做的是将 src 属性内的内容限制为不包含 ",这样它就不会一直匹配到第三个 src。
它将其视为 src 属性 ./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> :o

Change the regex to this and try again:

/<img src="[^"]*" alt="([^"]+)" title="[^"]*">/gi

The regex engine is usually greedy. It will try to match the longest text matching something. In your case i it matched all three links as one. What i did here was to limit the content inside the src attribute to not contain " so it will not match all the way to the third src.
It treated this as a src attribute ./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> <img src="./images/smilies/icon_e_surprised.gif" alt=":o" title="Surprised" /> <img src="./images/smilies/icon_twisted.gif

情域 2024-10-18 10:40:55

使用 *?+? 进行非贪婪匹配:

/<img src=".*?" alt="(.+?)" title=".*?">/gi

在失败的示例中发生的情况是第一个 .* 匹配所有这些:

./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> <img src="./images/smilies/icon_e_surprised.gif" alt=":o" title="Surprised" /> <img src="./images/smilies/icon_twisted.gif

仍然会产生有效的匹配,但这不是您想要的。 */+ 之后的 ? 使正则表达式使用成功匹配所需的最小字符串。阅读“当心贪婪!”部分在本文中。

我还想添加一般警告,即正则表达式不是解析 HTML 的最佳工具。例如,如果 src 属性具有转义的 ",甚至我的正则表达式也会中断。

Use *? and +? for non-greedy matching:

/<img src=".*?" alt="(.+?)" title=".*?">/gi

What's happening in your failing example is that the first .* is matching all of this:

./images/smilies/icon_e_smile.gif" alt=":)" title="Smile" /> <img src="./images/smilies/icon_e_surprised.gif" alt=":o" title="Surprised" /> <img src="./images/smilies/icon_twisted.gif

which is still producing a valid match, but it's not what you want. The ? after */+ makes the regex consume the smallest string necessary to make a successful match. Read the section "Watch Out for The Greediness!" in this article.

I'd like to also add the general warning that regular expressions aren't the best tool for parsing HTML. Even my regex will break if the src attribute has an escaped " for example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文