使用正则表达式查找不带 alt 属性的 img 标签

发布于 2024-09-29 03:08:19 字数 215 浏览 5 评论 0 原文

我正在浏览一个大型网站(1600 多个页面)以使其通过优先级 1 W3C WAI。因此,像图像标签之类的东西需要具有 alt 属性。

查找没有 alt 属性的 img 标签的正则表达式是什么?如果可能的话,请提供一个简短的解释,以便我可以用来查找其他问题。

我在一间办公室里,使用 Visual Web Developer 2008。编辑 >>查找对话可以使用正则表达式。

I am going through a large website (1600+ pages) to make it pass Priority 1 W3C WAI. As a result, things like image tags need to have alt attributes.

What would be the regular expression for finding img tags without alt attributes? If possible, with a wee explanation so I can use to find other issues.

I am in an office with Visual Web Developer 2008. The Edit >> Find dialogue can use regular expressions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

蓝颜夕 2024-10-06 03:08:19

以 Mr.Black 和 Roberts126 的答案为基础:

/(<img(?!.*?alt=(['"]).*?\2)[^>]*)(>)/

这将匹配代码中任何位置的 img 标签,该标签要么没有 alt 标签,要么 alt 标签后面没有 ="" 或 ='' (即无效的 alt 标签)。

分解:

(          : open capturing group
<img       : match the opening of an img tag
(?!        : open negative look-ahead
.*?        : lazy some or none to match any character
alt=(['"]) : match an 'alt' attribute followed by ' or " (and remember which for later)
.*?        : lazy some or none to match the value of the 'alt' attribute
\2)        : back-reference to the ' or " matched earlier
[^>]*      : match anything following the alt tag up to the closing '>' of the img tag
)          : close capturing group
(>)        : match the closing '>' of the img tag

如果您的代码编辑器允许通过正则表达式进行搜索和替换,您可以将其与替换字符串结合使用:

$1 alt=""$3

查找任何无 alt 的 img 标签并在其后附加一个空的 alt 标签。当对 HTML 电子邮件等使用间隔符或其他布局图像时,这非常有用。

Building on Mr.Black and Roberts126 answers:

/(<img(?!.*?alt=(['"]).*?\2)[^>]*)(>)/

This will match an img tag anywhere in the code which either has no alt tag or an alt tag which is not followed by ="" or ='' (i.e. invalid alt tags).

Breaking it down:

(          : open capturing group
<img       : match the opening of an img tag
(?!        : open negative look-ahead
.*?        : lazy some or none to match any character
alt=(['"]) : match an 'alt' attribute followed by ' or " (and remember which for later)
.*?        : lazy some or none to match the value of the 'alt' attribute
\2)        : back-reference to the ' or " matched earlier
[^>]*      : match anything following the alt tag up to the closing '>' of the img tag
)          : close capturing group
(>)        : match the closing '>' of the img tag

If your code editor allows search and replace by Regex you can use this in combination with the replace string:

$1 alt=""$3

To find any alt-less img tags and append them with an empty alt tag. This is useful when using spacers or other layout images for HTML emails and the like.

不美如何 2024-10-06 03:08:19

以下是我刚刚在自己的环境中尝试使用的大型企业代码库,并取得了一些成功(没有发现误报,但确实找到了有效的案例):

<img(?![^>]*\balt=)[^>]*?>

此搜索中发生了什么:

  1. 找到标签的开头
  2. ,查找是否存在零个或多个不是右括号的字符,同时……
  3. 检查是否存在以“alt”开头的单词(“\b”是为了确保我们不会在某些内容上获得中间单词名称匹配)像一个类值),后跟“=”,然后...
  4. 查找零个或多个不是右括号的字符
  5. 找到右括号

所以这将匹配:

<img src="foo.jpg" class="baltic" />

但它不会匹配以下任何一个:

<img src="foo.jpg" class="baltic" alt="" />
<img src="foo.jpg" alt="I have a value.">

Here is what I just tried in my own environment with a massive enterprise code base with some good success (found no false positives but definitely found valid cases):

<img(?![^>]*\balt=)[^>]*?>

What's going on in this search:

  1. find the opening of the tag
  2. look for the absence of zero or more characters that are not the closing bracket while also …
  3. Checking for the absence of of a word that begins with "alt" ("\b" is there for making sure we don't get a mid-word name match on something like a class value) and is followed by "=", then …
  4. look for zero or more characters that are not the closing bracket
  5. find the closing bracket

So this will match:

<img src="foo.jpg" class="baltic" />

But it won't match either of these:

<img src="foo.jpg" class="baltic" alt="" />
<img src="foo.jpg" alt="I have a value.">
望她远 2024-10-06 03:08:19

这在 Eclipse 中有效:

我也在更新第 508 节!

This works in Eclipse:

<img(?!.*alt).*?>

I'm updating for Section 508 too!

黯淡〆 2024-10-06 03:08:19

这对我有用。

^<img(?!.*alt).*$

这与以 开头且 alt 属性之前不包含任何数量的字符的任何字符串匹配。它甚至适用于 src="" 类型的属性。

This worked for me.

^<img(?!.*alt).*$

This matches any string beginning with <img that doesn't contain any number of characters before an alt attribute. It even works for src="<?php echo $imagename; ?>" type of attributes.

烟雨凡馨 2024-10-06 03:08:19

通过以下正则表达式,这是完全可能的:

<img([^a]|a[^l]|al[^t]|alt[^=])*?/?>

寻找不存在的东西是相当棘手的,但我们可以通过寻找不以“a”开头的组或不以“a”开头的组来欺骗他们。后面不要跟“l”等。

This is perfectly possible with following regEx:

<img([^a]|a[^l]|al[^t]|alt[^=])*?/?>

Looking for something that isn't there, is rather tricky, but we can trick them back, by looking for a group that doesn't start with 'a', or an 'a' that doesn't get followed by an 'l' and so on.

末蓝 2024-10-06 03:08:19

这确实很棘手,因为正则表达式主要是为了匹配现有的东西。通过环顾四周的技巧,你可以做一些事情,比如“找到 A 前面/后面没有 B”等。但我认为对你来说最务实的解决方案不是这样。

我的建议有点依赖于您现有的代码不要做太疯狂的事情,您可能需要对其进行微调,但我认为如果您真的想使用正则表达式搜索来解决您的问题,这是一个很好的选择。

因此,我建议找到所有 img 标签,这些标签可以(但不需要)具有 img 元素的所有有效属性。这是否是您可以使用的方法由您决定。

建议:

/<img\s*((src|align|border|height|hspace|ismap|longdesc|usemap|vspace|width|class|dir|lang|style|title|id)="[^"]"\s*)*\s*\/?>/

当前的限制是:

  1. 它期望您的属性值用双引号分隔,
  2. 它不考虑可能的内联 on*Event 属性,
  3. 它不会找到具有“非法”属性的 img 元素。

This is really tricky, because regular expressions are mostly about matching something that is there. With look-around trickery, you can do things like 'find A that is not preceded/followed by B', etc. But I think the most pragmatic solution for you wouldn't be that.

My proposal relies a little bit on your existing code not doing too crazy things, and you might have to fine-tune it, but I think it's a good shot, if you really want to use a RegEx-search for your problem.

So what I suggest would be to find all img tags, that can (but don't need to) have all valid attributes for an img-element. Whether that is an approach you can work with is for you to decide.

Proposal:

/<img\s*((src|align|border|height|hspace|ismap|longdesc|usemap|vspace|width|class|dir|lang|style|title|id)="[^"]"\s*)*\s*\/?>/

The current limitations are:

  1. It expects your attribute values to be delimited by double quotes,
  2. It doesn't take into account possible inline on*Event attributes,
  3. It doesn't find img elements with 'illegal' attributes.
绳情 2024-10-06 03:08:19

简单有效:

此正则表达式适用于查找缺少 alt< 的 标签/代码> 属性。

Simple and effective:

<img((?!\salt=).)*?

This regex works for find <img> tags missing the alt attribute.

酒解孤独 2024-10-06 03:08:19

我为此编写了一个简单的代码,没有正则表达式

let arr = []
$('img')
.filter(function() {
  arr.push(this.alt)
})
document.write(arr.filter(a=>!a).length + ' img without alt tag')

I wrote a simple code for this without Regex

let arr = []
$('img')
.filter(function() {
  arr.push(this.alt)
})
document.write(arr.filter(a=>!a).length + ' img without alt tag')
假情假意假温柔 2024-10-06 03:08:19

))*?alt)

<img - Find start of image tag
(?! - begin negative lookahead
( - begin group
\n|.(?!\/>) - Match either a new line or anything not followed by end of the tag
)*? - close group. Match zero or more (non-greedy)
alt - Match "alt" literally
) end of negative lookahead

这个在 vscode 中对我有用。它将突出显示所有不带 alt 属性的 img 标签的开头

<img(?!(\n|.(?!\/>))*?alt)

<img - Find start of image tag
(?! - begin negative lookahead
( - begin group
\n|.(?!\/>) - Match either a new line or anything not followed by end of the tag
)*? - close group. Match zero or more (non-greedy)
alt - Match "alt" literally
) end of negative lookahead

This one works for me in vscode. It will highlight the beginning of all the img tags without an alt attribute

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文