提取的href属性值如果它包含特定的关键字

发布于 2025-01-15 20:38:14 字数 373 浏览 1 评论 0原文

如何从这样的字符串中获取完整链接:

<a href="https://www.google.com/setprefdomain?prefdom=DE&amp;prev=https://www.google.de/&amp;sig=K_DtcF1dnV7Xn6g9Ir_3SUs6a6TiA%3D">

我想隔离以 'href="' 之后开始并以 'A%3D' 结尾的字符串,但前提是该字符串包含字符串 domain

我真的不知道如何检查是否包含字符串“domain”。

到目前为止,我的正则表达式是: /(?<=href=")(.*)(? =”)/gi

How is it possible to get the the full link from a string like this:

<a href="https://www.google.com/setprefdomain?prefdom=DE&prev=https://www.google.de/&sig=K_DtcF1dnV7Xn6g9Ir_3SUs6a6TiA%3D">

I want to isolate the string starting after 'href="' and ending with 'A%3D', but only if this string contains the string domain.

I don't really know, how to check, if the string 'domain' is included.

My regex so far is: /(?<=href=")(.*)(?=")/gi

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

塔塔猫 2025-01-22 20:38:14

我喜欢正则表达式,但出于稳定性考虑,我不喜欢用它解析有效的 html。

使用合法的解析技术来隔离 href 属性。这将确保您永远不会意外匹配 data-href 或共享连续字母 href 的任何其他属性。这也减轻了需要匹配单引号或双引号的可能性的负担。

隔离 href 属性后,使用 includes()indexOf() 检查 domain 是否位于字符串值中的任何位置。如果您需要加强匹配 domain 的准确性,您现在可能会考虑使用带有单词边界的正则表达式或对周围子字符串进行其他检查(例如检查 domain 是否出现在第一个)。

const str = '<a href="https://www.google.com/setprefdomain?prefdom=DE&prev=https://www.google.de/&sig=K_DtcF1dnV7Xn6g9Ir_3SUs6a6TiA%3D">',
      url = new DOMParser()
            .parseFromString(str, 'text/html')
            .documentElement.querySelector('a')
            .href;

console.log(url.includes('domain') ? url : null);


对于那些认为解析有效锚标记对于可靠构造的字符串来说工作量太大的人,那么您可以使用正则表达式作为快捷方式(但我可能不会在专业应用程序中使用)。

href 之前使用文字空格(或单词边界 - \b),以确保您定位正确的属性,而不是对较大的属性进行部分匹配。我将假设输入字符串保证用双引号引起来,因此匹配双引号之间的字符串。在双引号内,匹配一个或多个非双引号字符(贪婪地),然后匹配所查找的单词 domain,然后匹配一个或多个非双引号字符(贪婪地)。如果符合条件,这将返回隔离的 url,并清除一些可能损坏结果的边缘情况。

let str = `<a class="domain" data-dummy-href="example.com" href="https://www.google.com/setprefdomain?prefdom=DE&prev=https://www.google.de/&sig=K_DtcF1dnV7Xn6g9Ir_3SUs6a6TiA%3D" style="background-image: url('http://www.example.com/domain/123.png')">`;

console.log(str.match(/ href="([^"]+domain[^"]+)"/i)[1] || 'Not valid');

如果domain可能出现在href值的开头或结尾,则分别将+更改为*以将限定符从“one或更多”到“零个或更多”。

I love regex, but I prefer not to parse valid html with it as a matter of stability.

Use a legitimate parsing technique to isolate the href attribute. This will ensure that you will never accidentally match data-href or any other attributes that share the consecutive letters href. This also frees the burden of needing to match the possibility of single quotes or double quotes.

After the href attribute is isolated, use includes() or indexOf() to check if domain is anywhere in the string value. If you need to tighten up the accuracy of matching domain, you might now entertain using regex with word boundaries or other checks on surrounding substrings (such as checking if domain occurs before the first ?).

const str = '<a href="https://www.google.com/setprefdomain?prefdom=DE&prev=https://www.google.de/&sig=K_DtcF1dnV7Xn6g9Ir_3SUs6a6TiA%3D">',
      url = new DOMParser()
            .parseFromString(str, 'text/html')
            .documentElement.querySelector('a')
            .href;

console.log(url.includes('domain') ? url : null);


For those who think that parsing the valid anchor tag is too much work for a reliably constructed string, then you can use regex as a shortcut (but I probably wouldn't in a professional application).

Use a literal space (or word boundary - \b) before href to ensure that you are targetting the correct attribute and not making a partial match on a larger attribute. I am going to presume that the input string is guaranteed to be wrapped in double quotes, so match the string between the double quotes. Within the double quotes, match one or more non-double-quote characters (greedily), then the sought word domain, then one or more non-double-quote characters (greedily). This will return the isolated url if it qualifies and weed out a few fringe cases that could damage the result.

let str = `<a class="domain" data-dummy-href="example.com" href="https://www.google.com/setprefdomain?prefdom=DE&prev=https://www.google.de/&sig=K_DtcF1dnV7Xn6g9Ir_3SUs6a6TiA%3D" style="background-image: url('http://www.example.com/domain/123.png')">`;

console.log(str.match(/ href="([^"]+domain[^"]+)"/i)[1] || 'Not valid');

If domain may occur at the start or at the end of the href value, then respectively change + to * to change the qualifier from "one or more" to "zero or more".

白日梦 2025-01-22 20:38:14

我认为 @jscrip 的答案可能是最直接的方法。或者,您可以检查字符串是否 在匹配正则表达式之前包含字符串“domain”。例如:

let str = '<a href="https://www.google.com/setprefdomain?prefdom=DE&prev=https://www.google.de/&sig=K_DtcF1dnV7Xn6g9Ir_3SUs6a6TiA%3D">'

let href = str.includes('domain') ? str.match(/(?<=href=").*(?=")/)[0] : 'Not valid'

console.log(href)

I think @jscrip's answer may be the most straight forward way. Alternatively, you could check to see if the string includes the string 'domain' before matching the regex. For example:

let str = '<a href="https://www.google.com/setprefdomain?prefdom=DE&prev=https://www.google.de/&sig=K_DtcF1dnV7Xn6g9Ir_3SUs6a6TiA%3D">'

let href = str.includes('domain') ? str.match(/(?<=href=").*(?=")/)[0] : 'Not valid'

console.log(href)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文