当前位置：文江博客话题详情

CPP +验证 URL 的正则表达式

发布于 2024-10-31 07:13:06 字数 1105 浏览 4 评论 0原文

我想在 c++{MFC} 中构建一个正则表达式来验证 URL。

正则表达式必须满足以下条件。

有效网址：- http://cu-241.dell-tech.co。在/MyWebSite/ISAPIWEBSITE/Denypage.aspx/ http://www.google.com http://www.google.co.in

无效网址：-

http://cu-241.dell-tech.co.in/\MyWebSite/\ISAPIWEBSITE/\ Denypage.aspx/ = Regx 必须检查 & “/\MyWebSite/\ISAPIWEBSITE/\Denypage.aspx/”之间的 URL 为“\”字符无效
http://cu-241.dell-tech.co.in//////MyWebSite/ISAPIWEBSITE/Denypage.aspx/ = Regx 必须检查 &由于 URL 中多次输入“///////”，URL 无效。
http://news.google.co.in/% 5Cnwshp?hl=en&tab=wn = 正则表达式必须检查 &使额外插入 %5C 和 %5C 的 URL 无效%2F 字符。

我们如何开发满足上述条件的通用正则表达式。请帮助我们提供一个正则表达式来处理 CPP{MFC} 中的上述场景

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绾颜 2024-11-07 07:13:06

您是否尝试过使用 RFC 3986 建议？如果您能够使用 GCC-4.9，那么您可以直接使用。

它指出 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^ #]*))?(#(.*))? 你可以得到子匹配：

  scheme    = $2
  authority = $4
  path      = $5
  query     = $7
  fragment  = $9

例如：

int main(int argc, char *argv[])
{
  std::string url (argv[1]);
  unsigned counter = 0;

  std::regex url_regex (
    R"(^(([^:\/?#]+):)?(//([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?)",
    std::regex::extended
  );
  std::smatch url_match_result;

  std::cout << "Checking: " << url << std::endl;

  if (std::regex_match(url, url_match_result, url_regex)) {
    for (const auto& res : url_match_result) {
      std::cout << counter++ << ": " << res << std::endl;
    }
  } else {
    std::cerr << "Malformed url." << std::endl;
  }

  return EXIT_SUCCESS;
}

那么：

./url-matcher http://localhost.com/path\?hue\=br\#cool

Checking: http://localhost.com/path?hue=br#cool
0: http://localhost.com/path?hue=br#cool
1: http:
2: http
3: //localhost.com
4: localhost.com
5: /path
6: ?hue=br
7: hue=br
8: #cool
9: cool

Have you tried using the RFC 3986 suggestion? If you're capable of using GCC-4.9 then you can go directly with <regex>.

It states that with ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? you can get as submatches:

  scheme    = $2
  authority = $4
  path      = $5
  query     = $7
  fragment  = $9

For example:

int main(int argc, char *argv[])
{
  std::string url (argv[1]);
  unsigned counter = 0;

  std::regex url_regex (
    R"(^(([^:\/?#]+):)?(//([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?)",
    std::regex::extended
  );
  std::smatch url_match_result;

  std::cout << "Checking: " << url << std::endl;

  if (std::regex_match(url, url_match_result, url_regex)) {
    for (const auto& res : url_match_result) {
      std::cout << counter++ << ": " << res << std::endl;
    }
  } else {
    std::cerr << "Malformed url." << std::endl;
  }

  return EXIT_SUCCESS;
}

Then:

./url-matcher http://localhost.com/path\?hue\=br\#cool

Checking: http://localhost.com/path?hue=br#cool
0: http://localhost.com/path?hue=br#cool
1: http:
2: http
3: //localhost.com
4: localhost.com
5: /path
6: ?hue=br
7: hue=br
8: #cool
9: cool

回复收藏 0 原文

岁月静好 2024-11-07 07:13:06

查看http://gskinner.com/RegExr/，右侧有一个社区选项卡，您可以在其中找到贡献了正则表达式。有一个 URI 类别，不确定您是否能准确找到您需要的内容，但这是一个好的开始

回复收藏 0 原文

め可乐爱微笑 2024-11-07 07:13:06

使用以下正则表达式，您可以过滤掉大多数不正确的 URL：

int main(int argc, char* argv[]) {
    std::string url(argv[1]);
    std::regex urlRegex(R"(^https?://[0-9a-z\.-]+(:[1-9][0-9]*)?(/[^\s]*)*$)");

    if (!std::regex_match(value, urlRegex)) {
        throw Poco::InvalidArgumentException(
            "Malformed URL: \"" + value + "\". "
            "The URL must start with http:// or https://, "
            "the domain name should only contain lowercase alphanumeric characters, '.' and '-', "
            "the port should not start with 0, "
            "and the URL should not contain any whitespace.");
    }
}

它检查 URL 是否以 http:// 或 https:// 开头，域名是否为仅是带有 '.' 和 '-' 的小写字母数字字符，检查端口是否以 0 开头（例如 0123），并且允许任何端口号和任何不包含空格的路径/查询字符串。

但要绝对确保 URL 有效，您最好解析 URL。我不建议尝试用正则表达式覆盖所有场景（包括路径、查询、片段的正确性），因为这会非常困难。

With the following regex you can filter out simply most of the incorrect URLs:

int main(int argc, char* argv[]) {
    std::string url(argv[1]);
    std::regex urlRegex(R"(^https?://[0-9a-z\.-]+(:[1-9][0-9]*)?(/[^\s]*)*$)");

    if (!std::regex_match(value, urlRegex)) {
        throw Poco::InvalidArgumentException(
            "Malformed URL: \"" + value + "\". "
            "The URL must start with http:// or https://, "
            "the domain name should only contain lowercase alphanumeric characters, '.' and '-', "
            "the port should not start with 0, "
            "and the URL should not contain any whitespace.");
    }
}

It checks if the URL starts with http:// or https://, whether the domain name is only lowercase alphanumeric characters with '.' and '-', checks that the port is not starting with 0 (e.g. 0123), and allows for any port number and any path/query string that does not contain whitespace.

But to be absolutely sure that the URL is valid, you're probably better off parsing the URL. I would not recommend trying to cover all scenarios with regex (including the correctness of paths, queries, fragments), because it would be pretty difficult.

回复收藏 0 原文

~没有更多了~