正则表达式匹配 HTML 标签内的属性,其中可能包含 php 代码
一般来说,我会将 HTML 属性与此正则表达式进行匹配,
\w+=".*?"
但是当 HTML 包含 PHP 代码时,它会变得有点危险。请考虑以下标记:
<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>>
<?php echo $img; ?>
</option>
上述正则表达式将匹配 PHP 逻辑内部确定的属性 selected="selected"
。有没有一种方法可以匹配不在 PHP 标签内的属性,同时仍然匹配其值可能包含 PHP 逻辑的属性?如果不是,我可以删除不属于属性值的 PHP 代码吗?
编辑:这就是我到目前为止所拥有的:
\w+="(((.(?!<\?php))*?)|((.((?=<\?php).*?(?=\?>))*)*?))*"
这基本上意味着匹配一个以空格开头的字符串,然后贪婪地匹配字母数字字符,后跟等号,后跟双引号,然后匹配以下两个中的任何一个,同时捕获尽可能多的字符:
- A不包含字符串
的字符序列
- 包含模式
<\?php.*?\?>
或换句话说的字符序列贪婪地匹配属性的值部分及其所有 PHP 代码 所有这一切,直到遇到结束双引号......
Generally I'd match HTML attributes with this regex
\w+=".*?"
but when the HTML contains PHP code it gets kind of dicy. Please consider the following tag:
<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>>
<?php echo $img; ?>
</option>
the above regex will match the attribute selected="selected"
which is determined inside PHP logic. Is there a way to match attributes which are not inside PHP tags while still matching the ones whose value may contain PHP logic? If not could I just remove the PHP code which isn't part of an attribute value?
EDIT: Here's what I have so far:
\w+="(((.(?!<\?php))*?)|((.((?=<\?php).*?(?=\?>))*)*?))*"
Which basically means match a string which starts with a SPACE then greedily match alphanumeric characters followed by EQUALS sign followed by double quote and then match any of the following two while capturing as many characters as possible:
- A sequence of characters which does not contain the string
<?php
- A sequence of characters containing the pattern
<\?php.*?\?>
or in other words greedily match the value part of the attribute with all of its PHP code
All of that till a closing double quote is encountered...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这将匹配 PHP 代码段或完整的
attribute="value"
序列,其中值可能包含 PHP 代码。每场比赛结束后,您可以通过检查捕获组的内容来了解捕获的内容。如果您匹配的是纯 PHP 段,则除group[0]
之外的所有段都将为空;否则,group[1]
将包含属性名称,group[2]
将包含值。正则表达式假定
<
将仅出现在属性值内,作为标记的开头。当然,这在语法上不是一个有效的假设,但无论如何它可能是安全的。如果您需要的话,我可以使正则表达式更加精确,但它的可读性也会大大降低。
This will match either a PHP code segment or a complete
attribute="value"
sequence in which the value may contain PHP code. After each match you can find out what you caught by checking the contents of the capturing groups. If it's a pure PHP segment you matched, all butgroup[0]
will be empty; otherwise,group[1]
will contain the attribute name andgroup[2]
will contain the value.The regex assumes
<
will appear inside an attribute value only as the beginning of a<?php
tag. Of course that's not a syntactically valid assumption, but it's probably safe anyway. I can make the regex more precise if you need me to, but it will be also be much less readable.