CPP +验证 URL 的正则表达式

发布于 2024-10-31 07:13:06 字数 1105 浏览 4 评论 0原文

我想在 c++{MFC} 中构建一个正则表达式来验证 URL。

正则表达式必须满足以下条件。

有效网址:- http://cu-241.dell-tech.co。在/MyWebSite/ISAPIWEBSITE/Denypage.aspx/ http://www.google.com http://www.google.co.in

无效网址:-

  1. http://cu-241.dell-tech.co.in/\MyWebSite/\ISAPIWEBSITE/\ Denypage.aspx/ = Regx 必须检查 & “/\MyWebSite/\ISAPIWEBSITE/\Denypage.aspx/”之间的 URL 为“\”字符无效

  2. http://cu-241.dell-tech.co.in//////MyWebSite/ISAPIWEBSITE/Denypage.aspx/ = Regx 必须检查 &由于 URL 中多次输入“///////”,URL 无效。

  3. http://news.google.co.in/% 5Cnwshp?hl=en&tab=wn = 正则表达式必须检查 &使额外插入 %5C 和 %5C 的 URL 无效%2F 字符。

我们如何开发满足上述条件的通用正则表达式。 请帮助我们提供一个正则表达式来处理 CPP{MFC} 中的上述场景

I want to build a regular expression in c++{MFC} which validates the URL.

The regular expression must satisfy following conditions.

Valid URL:-
http://cu-241.dell-tech.co.in/MyWebSite/ISAPIWEBSITE/Denypage.aspx/
http://www.google.com
http://www.google.co.in

Invalid URL:-

  1. http://cu-241.dell-tech.co.in/\MyWebSite/\ISAPIWEBSITE/\Denypage.aspx/ = Regx must check & invalid URL as '\' character between "/\MyWebSite/\ISAPIWEBSITE/\Denypage.aspx/"

  2. http://cu-241.dell-tech.co.in//////MyWebSite/ISAPIWEBSITE/Denypage.aspx/ = Regx must check & invalidate URL due to multiple entries of "///////" in url.

  3. http://news.google.co.in/%5Cnwshp?hl=en&tab=wn = Regex must check & invalidate URL for additional insertion of %5C & %2F character.

How can we develop a generic Regular Expression satisfying above condition.
Please, Help us by providing a regular expression that will handle above scenario's in CPP{MFC}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

绾颜 2024-11-07 07:13:06

您是否尝试过使用 RFC 3986 建议?如果您能够使用 GCC-4.9,那么您可以直接使用

它指出 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^ #]*))?(#(.*))? 你可以得到子匹配:

  scheme    = $2
  authority = $4
  path      = $5
  query     = $7
  fragment  = $9

例如:

int main(int argc, char *argv[])
{
  std::string url (argv[1]);
  unsigned counter = 0;

  std::regex url_regex (
    R"(^(([^:\/?#]+):)?(//([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?)",
    std::regex::extended
  );
  std::smatch url_match_result;

  std::cout << "Checking: " << url << std::endl;

  if (std::regex_match(url, url_match_result, url_regex)) {
    for (const auto& res : url_match_result) {
      std::cout << counter++ << ": " << res << std::endl;
    }
  } else {
    std::cerr << "Malformed url." << std::endl;
  }

  return EXIT_SUCCESS;
}

那么:

./url-matcher http://localhost.com/path\?hue\=br\#cool

Checking: http://localhost.com/path?hue=br#cool
0: http://localhost.com/path?hue=br#cool
1: http:
2: http
3: //localhost.com
4: localhost.com
5: /path
6: ?hue=br
7: hue=br
8: #cool
9: cool

Have you tried using the RFC 3986 suggestion? If you're capable of using GCC-4.9 then you can go directly with <regex>.

It states that with ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? you can get as submatches:

  scheme    = $2
  authority = $4
  path      = $5
  query     = $7
  fragment  = $9

For example:

int main(int argc, char *argv[])
{
  std::string url (argv[1]);
  unsigned counter = 0;

  std::regex url_regex (
    R"(^(([^:\/?#]+):)?(//([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?)",
    std::regex::extended
  );
  std::smatch url_match_result;

  std::cout << "Checking: " << url << std::endl;

  if (std::regex_match(url, url_match_result, url_regex)) {
    for (const auto& res : url_match_result) {
      std::cout << counter++ << ": " << res << std::endl;
    }
  } else {
    std::cerr << "Malformed url." << std::endl;
  }

  return EXIT_SUCCESS;
}

Then:

./url-matcher http://localhost.com/path\?hue\=br\#cool

Checking: http://localhost.com/path?hue=br#cool
0: http://localhost.com/path?hue=br#cool
1: http:
2: http
3: //localhost.com
4: localhost.com
5: /path
6: ?hue=br
7: hue=br
8: #cool
9: cool
岁月静好 2024-11-07 07:13:06

查看http://gskinner.com/RegExr/,右侧有一个社区选项卡,您可以在其中找到贡献了正则表达式。有一个 URI 类别,不确定您是否能准确找到您需要的内容,但这是一个好的开始

look at http://gskinner.com/RegExr/, there is a community tab on the right where you find contributed regex's. There is a URI category, not sure you'll find exactly what you need but this is a good start

め可乐爱微笑 2024-11-07 07:13:06

使用以下正则表达式,您可以过滤掉大多数不正确的 URL:

int main(int argc, char* argv[]) {
    std::string url(argv[1]);
    std::regex urlRegex(R"(^https?://[0-9a-z\.-]+(:[1-9][0-9]*)?(/[^\s]*)*$)");

    if (!std::regex_match(value, urlRegex)) {
        throw Poco::InvalidArgumentException(
            "Malformed URL: \"" + value + "\". "
            "The URL must start with http:// or https://, "
            "the domain name should only contain lowercase alphanumeric characters, '.' and '-', "
            "the port should not start with 0, "
            "and the URL should not contain any whitespace.");
    }
}

它检查 URL 是否以 http://https:// 开头,域名是否为仅是带有 '.''-'小写字母数字字符,检查端口是否以 0 开头(例如 0123),并且允许任何端口号和任何不包含空格的路径/查询字符串。

但要绝对确保 URL 有效,您最好解析 URL。我不建议尝试用正则表达式覆盖所有场景(包括路径、查询、片段的正确性),因为这会非常困难。

With the following regex you can filter out simply most of the incorrect URLs:

int main(int argc, char* argv[]) {
    std::string url(argv[1]);
    std::regex urlRegex(R"(^https?://[0-9a-z\.-]+(:[1-9][0-9]*)?(/[^\s]*)*$)");

    if (!std::regex_match(value, urlRegex)) {
        throw Poco::InvalidArgumentException(
            "Malformed URL: \"" + value + "\". "
            "The URL must start with http:// or https://, "
            "the domain name should only contain lowercase alphanumeric characters, '.' and '-', "
            "the port should not start with 0, "
            "and the URL should not contain any whitespace.");
    }
}

It checks if the URL starts with http:// or https://, whether the domain name is only lowercase alphanumeric characters with '.' and '-', checks that the port is not starting with 0 (e.g. 0123), and allows for any port number and any path/query string that does not contain whitespace.

But to be absolutely sure that the URL is valid, you're probably better off parsing the URL. I would not recommend trying to cover all scenarios with regex (including the correctness of paths, queries, fragments), because it would be pretty difficult.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文