当前位置：文江博客话题详情

如何在Delphi中实现一套标准的超链接检测规则

发布于 2024-12-28 07:38:33 字数 525 浏览 1 评论 0原文

我目前在程序中自动检测文本中的超链接。我做得非常简单，只查找 http:// 或 www。

但是，一位用户建议我将其扩展为其他形式，例如： https:// 或 .com

然后我意识到它可能不会就此停止，因为还有 ftp、mailto 和 file、所有其他顶级域，甚至电子邮件地址和文件路径。

我认为最好的方法是遵循当前正在使用的一些常用的标准超链接检测规则集，将其限制在实用范围内。也许 Microsoft Word 是如何做到的，或者 RichEdit 是如何做到的，或者您可能知道更好的标准。

所以我的问题是：

是否有一个内置函数可以从 Delphi 调用来进行检测，如果有，调用会是什么样子？（我计划将来使用 FireMonkey，所以我更喜欢能够在 Windows 之外工作的东西。）

如果没有可用的功能，是否可以在某个地方找到一组记录在 Word 中检测到的规则，在 RichEdit 中，或者任何其他应该检测什么规则集？这样我就可以自己编写检测代码。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷爱 2025-01-04 07:38:33

尝试使用 PathIsURL 函数，该函数在ShLwApi 单元。

回复收藏 0 原文

青朷 2025-01-04 07:38:33

遵循从 RegexBuddy 库中获取的正则表达式可能会让您入门（我无法对性能做出任何声明）。

正则表达式

Match; JGsoft; case insensitive:  
\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|$!:,.;]*[A-Z0-9+&@#/%=~_|$]

说明

URL：全文查找
最终的字符类确保如果 URL 是某些文本的一部分，
URL 后面的标点符号（例如逗号或句号）不会被解释为一部分
网址。

匹配（全部或部分）

http://regexbuddy.com
http://www.regexbuddy.com 
http://www.regexbuddy.com/ 
http://www.regexbuddy.com/index.html 
http://www.regexbuddy.com/index.html?source=library 
You can download RegexBuddy at http://www.regexbuddy.com/download.html.

不匹配

regexbuddy.com
www.regexbuddy.com
"www.domain.com/quoted URL with spaces"
[email protected]

对于一组规则，您可以查看RFC 3986

统一资源标识符 (URI) 是一个紧凑的序列
标识抽象或物理资源的字符。这个
规范定义了通用 URI 语法和处理
解析可能采用相对形式的 URI 引用，以及
在
上使用 URI 的指南和安全注意事项
互联网

验证 RFC 3986 中指定的 URL 的正则表达式是

^
(# Scheme
 [a-z][a-z0-9+\-.]*:
 (# Authority & path
  //
  ([a-z0-9\-._~%!amp;'()*+,;=]+@)?              # User
  ([a-z0-9\-._~%]+                            # Named host
  |\[[a-f0-9:.]+\]                            # IPv6 host
  |\[v[a-f0-9][a-z0-9\-._~%!amp;'()*+,;=:]+\])  # IPvFuture host
  (:[0-9]+)?                                  # Port
  (/[a-z0-9\-._~%!amp;'()*+,;=:@]+)*/?          # Path
 |# Path without authority
  (/?[a-z0-9\-._~%!
amp;'()*+,;=:@]+(/[a-z0-9\-._~%!amp;'()*+,;=:@]+)*/?)?
 )
|# Relative URL (no scheme or authority)
 ([a-z0-9\-._~%!
amp;'()*+,;=@]+(/[a-z0-9\-._~%!amp;'()*+,;=:@]+)*/?  # Relative path
 |(/[a-z0-9\-._~%!amp;'()*+,;=:@]+)+/?)                            # Absolute path
)
# Query
(\?[a-z0-9\-._~%!amp;'()*+,;=:@/?]*)?
# Fragment
(\#[a-z0-9\-._~%!amp;'()*+,;=:@/?]*)?
$

Following regex taken from RegexBuddy's library might get you started (I can't make any claims about performance).

Regex

Match; JGsoft; case insensitive:  
\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|$!:,.;]*[A-Z0-9+&@#/%=~_|$]

Explanation

URL: Find in full text
The final character class makes sure that if an URL is part of some text,
punctuation such as a comma or full stop after the URL is not interpreted as part
of the URL.

Matches (whole or partial)

http://regexbuddy.com
http://www.regexbuddy.com 
http://www.regexbuddy.com/ 
http://www.regexbuddy.com/index.html 
http://www.regexbuddy.com/index.html?source=library 
You can download RegexBuddy at http://www.regexbuddy.com/download.html.

Does not match

regexbuddy.com
www.regexbuddy.com
"www.domain.com/quoted URL with spaces"
[email protected]

For a set of rules you might look into RFC 3986

A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource. This
specification defines the generic URI syntax and a process for
resolving URI references that might be in relative form, along with
guidelines and security considerations for the use of URIs on the
Internet

A regex that validates a URL as specified in RFC 3986 would be

^
(# Scheme
 [a-z][a-z0-9+\-.]*:
 (# Authority & path
  //
  ([a-z0-9\-._~%!amp;'()*+,;=]+@)?              # User
  ([a-z0-9\-._~%]+                            # Named host
  |\[[a-f0-9:.]+\]                            # IPv6 host
  |\[v[a-f0-9][a-z0-9\-._~%!amp;'()*+,;=:]+\])  # IPvFuture host
  (:[0-9]+)?                                  # Port
  (/[a-z0-9\-._~%!amp;'()*+,;=:@]+)*/?          # Path
 |# Path without authority
  (/?[a-z0-9\-._~%!
amp;'()*+,;=:@]+(/[a-z0-9\-._~%!amp;'()*+,;=:@]+)*/?)?
 )
|# Relative URL (no scheme or authority)
 ([a-z0-9\-._~%!
amp;'()*+,;=@]+(/[a-z0-9\-._~%!amp;'()*+,;=:@]+)*/?  # Relative path
 |(/[a-z0-9\-._~%!amp;'()*+,;=:@]+)+/?)                            # Absolute path
)
# Query
(\?[a-z0-9\-._~%!amp;'()*+,;=:@/?]*)?
# Fragment
(\#[a-z0-9\-._~%!amp;'()*+,;=:@/?]*)?
$

回复收藏 0 原文