我需要解析 URL 以获取我用 C++ 编写的应用程序中的协议、主机、路径和查询。该应用程序旨在跨平台。我很惊讶我在 boost 或 POCO 库。是不是我没有注意到的明显地方?关于合适的开源库有什么建议吗?或者这是我必须自己做的事情?这并不是非常复杂,但它似乎是一个常见的任务,我很惊讶没有一个通用的解决方案。
I need to parse a URL to get the protocol, host, path, and query in an application I am writing in C++. The application is intended to be cross-platform. I'm surprised I can't find anything that does this in the boost or POCO libraries. Is it somewhere obvious I'm not looking? Any suggestions on appropriate open source libs? Or is this something I just have to do my self? It's not super complicated but it seems like such a common task I am surprised there isn't a common solution.
发布评论
评论(21)
有一个库建议用于 Boost 包含,并允许您轻松解析 HTTP URI。它使用 Boost.Spirit,并且也是根据 Boost 软件许可证发布的。该库是 cpp-netlib,您可以在 http://cpp-netlib.github.com/ 找到其文档 - 您可以从 http://github.com/ 下载最新版本cpp-netlib/cpp-netlib/downloads 。
您要使用的相关类型是
boost::network::http::uri
并记录在 此处。There is a library that's proposed for Boost inclusion and allows you to parse HTTP URI's easily. It uses Boost.Spirit and is also released under the Boost Software License. The library is cpp-netlib which you can find the documentation for at http://cpp-netlib.github.com/ -- you can download the latest release from http://github.com/cpp-netlib/cpp-netlib/downloads .
The relevant type you'll want to use is
boost::network::http::uri
and is documented here.上面的 Wstring 版本,添加了我需要的其他字段。绝对可以改进,但足以满足我的目的。
测试/使用
Wstring version of above, added other fields I needed. Could definitely be refined, but good enough for my purposes.
Tests/Usage
非常抱歉,没能帮上忙。 :s
url.hh
url.cc
main.cc
Terribly sorry, couldn't help it. :s
url.hh
url.cc
main.cc
POCO 的 URI 类可以为您解析 URL。以下示例是 POCO URI 和 UUID 幻灯片中的示例的简化版本:
POCO's URI class can parse URLs for you. The following example is shortened version of the one in POCO URI and UUID slides:
为了完整起见,您可以使用用 C 编写的一个(毫无疑问,稍微包装一下): https:// uriparser.github.io/
[符合 RFC 并支持 Unicode]
这是一个非常基本的包装器,我一直在使用它来简单地获取解析结果。
For completeness, there is one written in C that you could use (with a little wrapping, no doubt): https://uriparser.github.io/
[RFC-compliant and supports Unicode]
Here's a very basic wrapper I've been using for simply grabbing the results of a parse.
Poco 库现在有一个类用于解析 URI 并反馈主机、路径段和查询字符串等。
https://pocoproject.org/pro/docs/Poco.URI.html
The Poco library now has a class for dissecting URI's and feeding back the host, path segments and query string etc.
https://pocoproject.org/pro/docs/Poco.URI.html
Facebook 的 Folly 库可以轻松为您完成这项工作。只需使用 Uri 类:
Facebook's Folly library can do the job for you easily. Simply use the Uri class:
QT 有 QUrl 来实现此目的。 GNOME 在 libsoup,您可能会发现它更轻量级。
QT has QUrl for this. GNOME has SoupURI in libsoup, which you'll probably find a little more light-weight.
我知道这是一个非常古老的问题,但我发现以下内容很有用:
http://www.zedwood.com/article/cpp-boost-url-regex
它给出了3个例子:(
有Boost)
(没有Boost)
(没有Boost的不同方式)
I know this is a very old question, but I've found the following useful:
http://www.zedwood.com/article/cpp-boost-url-regex
It gives 3 examples:
(With Boost)
(Without Boost)
(Different way without Boost)
这个库非常小而且轻量级: https://github.com/corporateshark/LUrlParser
然而,它是仅解析,无 URL 规范化/验证。
This library is very tiny and lightweight: https://github.com/corporateshark/LUrlParser
However, it is parsing only, no URL normalization/validation.
同样感兴趣的可能是 http://code.google.com/p/uri-grammar/ 就像 Dean Michael 的 netlib 使用 boostspirit 来解析 URI。在 使用 Boost::Spirit 的简单表达式解析器示例中遇到它?
Also of interest could be http://code.google.com/p/uri-grammar/ which like Dean Michael's netlib uses boost spirit to parse a URI. Came across it at Simple expression parser example using Boost::Spirit?
有新发布的 google-url 库:
http://code.google.com/p/ google-url/
该库提供了低级 url 解析 API 以及称为 GURL 的高级抽象。这是一个使用它的例子:
我对它有两个小抱怨:(1)它想默认使用 ICU 来处理不同的字符串编码,(2)它对日志记录做了一些假设(但我认为它们可以被禁用)。换句话说,该库并不完全独立,但我认为它仍然是一个很好的起点,特别是如果您已经在使用 ICU。
There is the newly released google-url lib:
http://code.google.com/p/google-url/
The library provides a low-level url parsing API as well as a higher-level abstraction called GURL. Here's an example using that:
Two small complaints I have with it: (1) it wants to use ICU by default to deal with different string encodings and (2) it makes some assumptions about logging (but I think they can be disabled). In other words, the library is not completely stand-alone as it exists, but I think it's still a good basis to start with, especially if you are already using ICU.
我可以提供另一个基于 std::regex 的独立解决方案吗:
我为正则表达式的每个部分添加了解释。通过这种方式,您可以准确选择相关部分来解析您期望获得的 URL。只需记住相应地更改所需的正则表达式组索引即可。
May I offer another self-contained solution based on std::regex :
I added explanations for each part of the regular expression. This way allows you to choose exactly the relevant parts to parse for the URL that you're expecting to get. Just remember to change the desired regular expression group indices accordingly.
您可以使用的一个小依赖项是 uriparser,它最近已移至 GitHub。
您可以在他们的代码中找到一个最小的示例:https://github。 com/uriparser/uriparser/blob/63384be4fb8197264c55ff53a135110ecd5bd8c4/tool/uriparse.c
这将比 Boost 或 Poco 更轻量。唯一的问题是它是 C 语言。
还有一个 Buckaroo 包:
A small dependency you can use is uriparser, which recently moved to GitHub.
You can find a minimal example in their code: https://github.com/uriparser/uriparser/blob/63384be4fb8197264c55ff53a135110ecd5bd8c4/tool/uriparse.c
This will be more lightweight than Boost or Poco. The only catch is that it is C.
There is also a Buckaroo package:
我在这里尝试了几个解决方案,但后来决定编写自己的解决方案,可以将其放入项目中而无需任何外部依赖项(c++17 除外)。
现在,它通过了所有测试。但是,如果您发现任何未成功的案例,请随时创建 Pull Request 或 Issue。
我会及时更新并提高其质量。欢迎提出建议!我还在尝试这种设计,以便每个存储库只有一个高质量的类,以便标头和源代码可以直接放入项目中(而不是构建库或仅标头)。它似乎运行良好(我在自己的项目中使用 git 子模块和符号链接)。
https://github.com/homer6/url
I tried a couple of the solutions here, but then decided to write my own that could just be dropped into a project without any external dependencies (except c++17).
Right now, it passes all tests. But, if you find any cases that don't succeed, please feel free to create a Pull Request or an Issue.
I'll keep it up to date and improve its quality. Suggestions welcome! I'm also trying out this design to only have a single, high-quality class per repository so that the header and source can just be dropped into a project (as opposed to building a library or header-only). It appears to be working out well (I'm using git submodules and symlinks in my own projects).
https://github.com/homer6/url
您可以尝试名为 C++ REST SDK 的开源库(由 Microsoft 创建,根据 Apache License 2.0 分发) )。它可以针对多种平台构建,包括 Windows、Linux、OSX、iOS、Android)。有一个名为
web::uri
的类,您可以在其中放入一个字符串并可以检索各个 URL 组件。下面是一个代码示例(在 Windows 上测试):输出将是:
还有其他易于使用的方法,例如从查询中访问单个属性/值对、将路径拆分为组件等。
You could try the open-source library called C++ REST SDK (created by Microsoft, distributed under the Apache License 2.0). It can be built for several platforms including Windows, Linux, OSX, iOS, Android). There is a class called
web::uri
where you put in a string and can retrieve individual URL components. Here is a code sample (tested on Windows):The output will be:
There are also other easy-to-use methods, e.g. to access individual attribute/value pairs from the query, split the path into components, etc.
如果您使用
oatpp
进行 Web 请求处理,您可以发现其内置 -在 URL 解析中有用:上面的代码片段检索主机名。以类似的方式:
If you use
oatpp
for web request handling, you can find its built-in URL parsing useful:The above snippet retrieves the hostname. In a similar way:
还有另一个库 https://snapwebsites.org/project/libtld 处理所有可能的顶级域和 URI 模式
There is yet another library https://snapwebsites.org/project/libtld which handles all possible top level domains and URI shema
我开发了一种“面向对象”的解决方案,一个 C++ 类,可与 @Mr.Jones 和 @velcrow 解决方案等一个正则表达式一起使用。我的
Url
类执行 url/uri '解析'。我认为我改进了 velcrow 正则表达式,使其更加健壮,并且还包括用户名部分。
遵循我的想法的第一个版本,我在我的 GPL3 许可的开源项目 Cpp URL 解析器。
省略了
#ifdef/ndef
臃肿部分,如下Url.h
这是
Url.cpp
实现文件的代码:使用示例:
可以还更新 Url 对象以表示(并解析)另一个 URL:
我不是全职 C++ 开发人员,因此,我不确定我是否遵循 100% C++ 最佳实践。
任何提示表示赞赏。
Ps:让我们看看Cpp URL Parser,那里有一些改进。
玩得开心
I have developed an "object oriented" solution, one C++ class, that works with one regex like @Mr.Jones and @velcrow solutions. My
Url
class performs url/uri 'parsing'.I think I improved velcrow regex to be more robust and includes also the username part.
Follows the first version of my idea, I have released the same code, improved, in my GPL3 licensed open source project Cpp URL Parser.
Omitted
#ifdef/ndef
bloat part, followsUrl.h
This is the code of the
Url.cpp
implementation file:Usage example:
You can also update the Url object to represent (and parse) another URL:
I'm not a full-time C++ developer, so, I'm not sure I followed 100% C++ best-practises.
Any tip is appreciated.
P.s: let's look at Cpp URL Parser, there are refinements there.
Have fun
获取协议、主机、路径的简单解决方案
simple solution to get the protocol, host, path