有没有更好的方法在 C++ 中执行 URL 模式匹配比迭代?

发布于 2024-11-17 04:28:21 字数 3176 浏览 2 评论 0原文

我有一个模式匹配例程,它根据用于请求命令的 URL 从 std::map 查找值。 URL 映射表填充如下值:

// Assume this->commands_ is defined elsewhere as std::map<std::string, int>
// Providing a number of URL examples to give an idea of the structure of
// the URLs
this->commands_["/session"] = 1;
this->commands_["/session/:sessionid/url"] = 2;
this->commands_["/session/:sessionid/back"] = 3;
this->commands_["/session/:sessionid/forward"] = 4;
this->commands_["/session/:sessionid/element"] = 5;
this->commands_["/session/:sessionid/element/:id/text"] = 6;
this->commands_["/session/:sessionid/element/:id/value"] = 7;

每个 URL 模式中的标记(由前面的“:”指定)被替换为调用查找例程中的实际值(例如,"/session/1234-8a0f /element/5bed-6789/text"),但是是我需要保留的命名参数。上面示例中的命名令牌列表并不详尽,上面列出的位置中可能还有其他命名令牌。请注意,令牌值是十六进制编码的数字。

目前,我正在迭代映射的键,用正则表达式值替换替换标记,并使用 std::tr1 正则表达式类对请求的值执行正则表达式匹配,将匹配的标记名称和值捕获到向量中。该代码在功能上与此等效(为清楚起见,代码比通常编写的代码更详细):

// Assume "using namespace std;" has been declared,
// and appropriate headers #included.
int Server::LookupCommand(const string& uri,
                          vector<string>* names,
                          vector<string>* values) {
    int value = 0;

    // Iterate through the keys of the map
    map<string, int>::const_iterator it = this->commands_.begin();
    for (; it != this->commands_.end(); ++it) {
        string url_candidate = it->first;

        // Substitute template parameter names with regex match strings
        size_t param_start_pos = url_candidate.find_first_of(":");
        while (param_start_pos != string::npos) {
            size_t param_len = string::npos;
            size_t param_end_pos = url_candidate.find_first_of("/",
                                                            param_start_pos);
            if (param_end_pos != string::npos) {
                param_len = param_end_pos - param_start_pos;
            }

            // Skip the leading colon
            string param_name = url_candidate.substr(param_start_pos + 1,
                                                     param_len - 1);
            names->push_back(param_name);
            url_candidate.replace(param_start_pos,
                                  param_len,
                                  "([0-9a-fA-F-]+)");
            param_start_pos = url_candidate.find_first_of(":");
        }

        tr1::regex matcher("^" + url_candidate + "$");
        tr1::match_results<string::const_iterator> matches;
        if (tr1::regex_search(uri, matches, matcher)) {
            size_t param_count = names->size();
            for (unsigned int i = 0; i < param_count; i++) {
                // Need i + 1 to get token match; matches[0] is full string.
                string param_value = matches[i + 1].str();
                values->push_back(param_value);
            }
            found_value = it->second;
            break;
        }
    }
    return value;
}

请注意,我没有使用 Boost 库,也不允许我在该项目中使用它们。

这段代码对我来说效率极低,因为我每次都在遍历地图的键,但我却无法只见树木不见森林,而且很难想出替代方案。尽管描述听起来很无意义,但我本质上试图构建的是基于键的正则表达式匹配而不是精确匹配的映射查找。我怎样才能使这个更有效率?我在设计这个函数时忽略了哪些模式?

I have a pattern matching routine that looks up values from a std::map based on the URL used to request the command. The URL mapping table is filled with values like:

// Assume this->commands_ is defined elsewhere as std::map<std::string, int>
// Providing a number of URL examples to give an idea of the structure of
// the URLs
this->commands_["/session"] = 1;
this->commands_["/session/:sessionid/url"] = 2;
this->commands_["/session/:sessionid/back"] = 3;
this->commands_["/session/:sessionid/forward"] = 4;
this->commands_["/session/:sessionid/element"] = 5;
this->commands_["/session/:sessionid/element/:id/text"] = 6;
this->commands_["/session/:sessionid/element/:id/value"] = 7;

The tokens in each URL pattern (specified by the preceding ':') are replaced by actual values in the calls to the lookup routine (e.g., "/session/1234-8a0f/element/5bed-6789/text"), but are named parameters that I will need to retain. The list of named tokens in the above example is not exhaustive, and there may be other named tokens in the positions listed above. Note that the token values are hex-encoded numbers.

At present, I am iterating through the keys of the map, substituting the replacement tokens with regex values, and performing a regex match on the requested value using the std::tr1 regex classes, capturing the matched token names and values into vectors. The code is functionally equivalent to this (code is more verbose than ordinarily written for clarity):

// Assume "using namespace std;" has been declared,
// and appropriate headers #included.
int Server::LookupCommand(const string& uri,
                          vector<string>* names,
                          vector<string>* values) {
    int value = 0;

    // Iterate through the keys of the map
    map<string, int>::const_iterator it = this->commands_.begin();
    for (; it != this->commands_.end(); ++it) {
        string url_candidate = it->first;

        // Substitute template parameter names with regex match strings
        size_t param_start_pos = url_candidate.find_first_of(":");
        while (param_start_pos != string::npos) {
            size_t param_len = string::npos;
            size_t param_end_pos = url_candidate.find_first_of("/",
                                                            param_start_pos);
            if (param_end_pos != string::npos) {
                param_len = param_end_pos - param_start_pos;
            }

            // Skip the leading colon
            string param_name = url_candidate.substr(param_start_pos + 1,
                                                     param_len - 1);
            names->push_back(param_name);
            url_candidate.replace(param_start_pos,
                                  param_len,
                                  "([0-9a-fA-F-]+)");
            param_start_pos = url_candidate.find_first_of(":");
        }

        tr1::regex matcher("^" + url_candidate + "$");
        tr1::match_results<string::const_iterator> matches;
        if (tr1::regex_search(uri, matches, matcher)) {
            size_t param_count = names->size();
            for (unsigned int i = 0; i < param_count; i++) {
                // Need i + 1 to get token match; matches[0] is full string.
                string param_value = matches[i + 1].str();
                values->push_back(param_value);
            }
            found_value = it->second;
            break;
        }
    }
    return value;
}

Please note that I'm not using the Boost libraries, nor am I allowed to use them for this project.

This code feels wildly inefficient to me because I'm iterating through the keys of the map every single time, but I'm suffering from not being able to see the proverbial forest for the trees, and am having a difficult time coming up with alternatives. Though the description sounds nonsensical, what I'm essentially trying to construct is a map lookup based on regex matching of the key rather than an exact match. How can I make this more efficient? What pattern have I overlooked in my design of this function?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烟酒忠诚 2024-11-24 04:28:21

在我看来,您可以将 URL 拆分为其组件(使用 这里也许),然后使用决策树来查找正确的模式。

在此树中,每个节点都是与 URL 的特定组件相匹配的正则表达式,叶子将是您当前存储在映射中的值:

                                 session
                                    |   \
                                    |    1
                                    |
                              ([0-9a-fA-F-]+)
                              /     |     \
                             /      |      \
                           url     back    element
                            |       |       |     \
                            |       |       |      5
                            2       3       |
                                        ([0-9a-fA-F-]+)

以上是示例树的一部分。您必须使用自定义数据结构来实现树,但这相当简单。

The way I see it, you could split the URL into its components (using one of the suggestions in here maybe), and then use a decision tree to find the correct pattern.

In this tree, every node would be a regular expression that matches a particular component of your URL, and the leaves would be the values you currently store in your map:

                                 session
                                    |   \
                                    |    1
                                    |
                              ([0-9a-fA-F-]+)
                              /     |     \
                             /      |      \
                           url     back    element
                            |       |       |     \
                            |       |       |      5
                            2       3       |
                                        ([0-9a-fA-F-]+)

The above is a part of the tree for your example. You'd have to use a custom data-structure to implement the tree, but that is rather simple.

百善笑为先 2024-11-24 04:28:21

与其将模式中的 :session_id 和 :id 标记替换为特定值然后进行匹配,不如获取候选者并使用正则表达式替换它们以将特定值替换为占位符(session_id 和 id)?然后您可以直接在地图中查找通用字符串。

Rather than replacing the :session_id and :id tokens in the patterns with specific values then doing the matching, how about taking the candidates and using regex replace on them to replace the specific values with the placeholders (session_id and id)? Then you could directly look for the genericized strings in the map.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文