两个字符串之间的正则表达式 - 包括最后一个字符串

发布于 2025-01-11 18:20:11 字数 279 浏览 0 评论 0原文

所以我试图从文本文件中提取所有域。它们的开头可能有一个特殊字符(如字体标签)。

迄今为止: (?<=>).*?(?=com|net)

我正在搜索的文本:

thisdomain.com 假文本>thatdomain.net

当前正在查找“thisdomain 和 thatdomain,但当然它会切断域扩展名。我已经研究了正则表达式文档大约一个小时,但找不到在> 和 .com 而不切断 .com 有什么建议吗?

So I'm trying to pull all domains out of a text file. They may have a special character at the beginning (like a font tag).

so far:
(?<=>).*?(?=com|net)

Text I'm searching:

thisdomain.com fake text >thatdomain.net

It is currently finding "thisdomain and thatdomain but of course it's cutting off the domain extension. I've dug through regex docs for about an hour and can't find a way to search between the > and .com with out cutting off the .com. Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

丿*梦醉红颜 2025-01-18 18:20:11

使用

(?<=>).*?(?:com|net)

请参阅正则表达式证明

说明

--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    com                      'com'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    net                      'net'
--------------------------------------------------------------------------------
  )                        end of grouping

Use

(?<=>).*?(?:com|net)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    com                      'com'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    net                      'net'
--------------------------------------------------------------------------------
  )                        end of grouping
时光沙漏 2025-01-18 18:20:11

您不匹配 .com 的原因是因为这部分 (?=com|net) 是一个非消耗性的断言。

相反,您可以只匹配 .com.net ,包括阻止匹配的点,例如 >clarinet>romcom

(?<=>)\S+\.(?:com|net)

模式匹配:

  • (?<= 正向前瞻,断言当前位置左侧的内容
    • > 按字面意思匹配
  • ) 关闭lookbehind
  • \S+ 匹配 1 个或多个非空白字符
  • \.(?:com|net) 匹配 .com.net

查看 正则表达式演示

不使用环视,您还可以使用捕获组并匹配 >

>(\S+\.(?:com|net))

查看另一个 正则表达式演示

The reason you are not matching .com is because this part (?=com|net) is an assertion which is non consuming.

Instead you can just match either .com or .net including the dot preventing to match for example >clarinet or >romcom

(?<=>)\S+\.(?:com|net)

The pattern matches:

  • (?<= Positive lookahead, assert what is directly to the left of the current position
    • > Match literally
  • ) Close the lookbehind
  • \S+ Match 1 or more non whitespace characters
  • \.(?:com|net) Match either .com or .net

See a regex demo.

Without using lookarounds, you can also use a capture group and match the >

>(\S+\.(?:com|net))

See another regex demo.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文