CPP +验证 URL 的正则表达式
我想在 c++{MFC} 中构建一个正则表达式来验证 URL。
正则表达式必须满足以下条件。
有效网址:- http://cu-241.dell-tech.co。在/MyWebSite/ISAPIWEBSITE/Denypage.aspx/ http://www.google.com http://www.google.co.in
无效网址:-
http://cu-241.dell-tech.co.in/\MyWebSite/\ISAPIWEBSITE/\ Denypage.aspx/ = Regx 必须检查 & “/\MyWebSite/\ISAPIWEBSITE/\Denypage.aspx/”之间的 URL 为“\”字符无效
http://cu-241.dell-tech.co.in//////MyWebSite/ISAPIWEBSITE/Denypage.aspx/ = Regx 必须检查 &由于 URL 中多次输入“///////”,URL 无效。
http://news.google.co.in/% 5Cnwshp?hl=en&tab=wn = 正则表达式必须检查 &使额外插入 %5C 和 %5C 的 URL 无效%2F 字符。
我们如何开发满足上述条件的通用正则表达式。 请帮助我们提供一个正则表达式来处理 CPP{MFC} 中的上述场景
I want to build a regular expression in c++{MFC} which validates the URL.
The regular expression must satisfy following conditions.
Valid URL:-
http://cu-241.dell-tech.co.in/MyWebSite/ISAPIWEBSITE/Denypage.aspx/
http://www.google.com
http://www.google.co.in
Invalid URL:-
http://cu-241.dell-tech.co.in/\MyWebSite/\ISAPIWEBSITE/\Denypage.aspx/ = Regx must check & invalid URL as '\' character between "/\MyWebSite/\ISAPIWEBSITE/\Denypage.aspx/"
http://cu-241.dell-tech.co.in//////MyWebSite/ISAPIWEBSITE/Denypage.aspx/ = Regx must check & invalidate URL due to multiple entries of "///////" in url.
http://news.google.co.in/%5Cnwshp?hl=en&tab=wn = Regex must check & invalidate URL for additional insertion of %5C & %2F character.
How can we develop a generic Regular Expression satisfying above condition.
Please, Help us by providing a regular expression that will handle above scenario's in CPP{MFC}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您是否尝试过使用 RFC 3986 建议?如果您能够使用 GCC-4.9,那么您可以直接使用
。它指出
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^ #]*))?(#(.*))?
你可以得到子匹配:例如:
那么:
Have you tried using the RFC 3986 suggestion? If you're capable of using GCC-4.9 then you can go directly with
<regex>
.It states that with
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
you can get as submatches:For example:
Then:
查看http://gskinner.com/RegExr/,右侧有一个社区选项卡,您可以在其中找到贡献了正则表达式。有一个 URI 类别,不确定您是否能准确找到您需要的内容,但这是一个好的开始
look at http://gskinner.com/RegExr/, there is a community tab on the right where you find contributed regex's. There is a URI category, not sure you'll find exactly what you need but this is a good start
使用以下正则表达式,您可以过滤掉大多数不正确的 URL:
它检查 URL 是否以
http://
或https://
开头,域名是否为仅是带有'.'
和'-'
的小写字母数字字符
,检查端口是否以 0 开头(例如 0123),并且允许任何端口号和任何不包含空格的路径/查询字符串。但要绝对确保 URL 有效,您最好解析 URL。我不建议尝试用正则表达式覆盖所有场景(包括路径、查询、片段的正确性),因为这会非常困难。
With the following regex you can filter out simply most of the incorrect URLs:
It checks if the URL starts with
http://
orhttps://
, whether the domain name is onlylowercase alphanumeric characters
with'.'
and'-'
, checks that the port is not starting with 0 (e.g. 0123), and allows for any port number and any path/query string that does not contain whitespace.But to be absolutely sure that the URL is valid, you're probably better off parsing the URL. I would not recommend trying to cover all scenarios with regex (including the correctness of paths, queries, fragments), because it would be pretty difficult.