preg_match 匹配 src=、background= 和 url(..)

发布于 2025-01-04 17:53:28 字数 2031 浏览 1 评论 0原文

我想找到一个正则表达式,可以找到(在给定的 HTML 中)以下图像:

  • 捕获于: src=""
  • 捕获于: src=''
  • 捕获于:background=""
  • 捕获于:background=''
  • 捕获于:url("")
  • 捕获于: url('')
  • 捕获于: 但这些是

到目前为止,我想出了:

preg_match_all("/src=((\"|'|)?(.*\.(png|gif|jpg))(\"|'|))/Ui", $strHTML, $arrMatches);

preg_match_all("/background=((\"|'|)?(.*\.(png|gif|jpg))(\"|'|))/Ui", $strHTML, $arrMatches);

preg_match_all("/url\((\"|'|)?((.*\.(png|gif|jpg))(\"|'|))\)/Ui", $strHTML, $arrMatches);

不完整的,因为它们不包含前缀(src/background/url)。另外,在安全方面,我认为它们可以进一步改进,以防止有人进入 src="http://somesite.com/someurl.exe?ext=jpg"

任何正确方向的帮助都是赞赏。

编辑:

我想我明白了,尽管代码肯定可以改进,甚至可能组合和/或优化:)

/* match CSS url() links */

preg_match_all("/(url\((\"|'|)(.*\.(png|gif|jpg|jpeg))(\"|'|)\))/Ui", $strHTML, $arrMatches);

Array
(
    [0] => Array
        (
            [0] => url('test1.gif')
            [1] => url(test2.gif)
            [2] => url("test3.gif")
        )

    [1] => Array
        (
            [0] => url('test1.gif')
            [1] => url(test2.gif)
            [2] => url("test3.gif")
        )

    [2] => Array
        (
            [0] => '
            [1] => 
            [2] => "
        )

    [3] => Array
        (
            [0] => test1.gif
            [1] => test2.gif
            [2] => test3.gif
        )

    [4] => Array
        (
            [0] => gif
            [1] => gif
            [2] => gif
        )

    [5] => Array
        (
            [0] => '
            [1] => 
            [2] => "
        )

)

/* match img links */
preg_match_all("/(src=(\"\'??)(.*\.(png|gif|jpg|jpeg))(\"\'??))/Ui", $strHTML, $arrMatches);

/* match background links */
preg_match_all("/(background=(\"\'??)(.*\.(png|gif|jpg|jpeg))(\"\'??))/Ui", $strHTML, $arrMatches);

I would like to find a regular expression that could find (in given HTML) the following images:

  • Those captured in: src=""
  • Those captured in: src=''
  • Those captured in: background=""
  • Those captured in: background=''
  • Those captured in: url("")
  • Those captured in: url('')
  • Those captured in: url()

So far i came up with:

preg_match_all("/src=((\"|'|)?(.*\.(png|gif|jpg))(\"|'|))/Ui", $strHTML, $arrMatches);

preg_match_all("/background=((\"|'|)?(.*\.(png|gif|jpg))(\"|'|))/Ui", $strHTML, $arrMatches);

preg_match_all("/url\((\"|'|)?((.*\.(png|gif|jpg))(\"|'|))\)/Ui", $strHTML, $arrMatches);

But those are incomplete in that they don't include the prefix (src/background/url). Also, security wise I think they can be improved further, to prevent somebody from entering src="http://somesite.com/someurl.exe?ext=jpg"

Any help in the right direction is appreciated.

edit:

I think i got it, although the code can surely be improved, possibly even combined and/or optimized :)

/* match CSS url() links */

preg_match_all("/(url\((\"|'|)(.*\.(png|gif|jpg|jpeg))(\"|'|)\))/Ui", $strHTML, $arrMatches);

Array
(
    [0] => Array
        (
            [0] => url('test1.gif')
            [1] => url(test2.gif)
            [2] => url("test3.gif")
        )

    [1] => Array
        (
            [0] => url('test1.gif')
            [1] => url(test2.gif)
            [2] => url("test3.gif")
        )

    [2] => Array
        (
            [0] => '
            [1] => 
            [2] => "
        )

    [3] => Array
        (
            [0] => test1.gif
            [1] => test2.gif
            [2] => test3.gif
        )

    [4] => Array
        (
            [0] => gif
            [1] => gif
            [2] => gif
        )

    [5] => Array
        (
            [0] => '
            [1] => 
            [2] => "
        )

)

/* match img links */
preg_match_all("/(src=(\"\'??)(.*\.(png|gif|jpg|jpeg))(\"\'??))/Ui", $strHTML, $arrMatches);

/* match background links */
preg_match_all("/(background=(\"\'??)(.*\.(png|gif|jpg|jpeg))(\"\'??))/Ui", $strHTML, $arrMatches);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

匿名的好友 2025-01-11 17:53:28

如果您确定这些属性名称(src、url 和背景)...

$arr = array(
    'url("http://somesite.com/someurl.exe?src=jpg")',
    'url(http://somesite.com/someurl.exe?src=jpg)',
    'src="http://somesite.com/someurl.exe?src=jpg"',
    'src="http://somesite.com/someurl.exe?ext=jpg"',
    'background="http://somesite.com/someurl.exe?src=jpg"'
);
foreach ($arr as $str) {
    preg_match_all('/(?<=src=|background=|url\()(\'|")?(?<image>.*?)(?=\1|\))/i',$str,$matches);
    echo $str;
    foreach($matches['image'] as $img) {
        echo "\nimage: <b>$img</b>\n";
    }
    echo "\n";
}

If you're sure about those attribute names (src,url and background)...

$arr = array(
    'url("http://somesite.com/someurl.exe?src=jpg")',
    'url(http://somesite.com/someurl.exe?src=jpg)',
    'src="http://somesite.com/someurl.exe?src=jpg"',
    'src="http://somesite.com/someurl.exe?ext=jpg"',
    'background="http://somesite.com/someurl.exe?src=jpg"'
);
foreach ($arr as $str) {
    preg_match_all('/(?<=src=|background=|url\()(\'|")?(?<image>.*?)(?=\1|\))/i',$str,$matches);
    echo $str;
    foreach($matches['image'] as $img) {
        echo "\nimage: <b>$img</b>\n";
    }
    echo "\n";
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文