如何从 C++ 中的 std::string 中提取 html 标签名称？

发布于 2025-01-11 07:16:02 字数 1554 浏览 3 评论 0原文

在此示例中：

#include <iostream>
#include <fstream>
#include <string>
#include <deque>
#include <sstream>


struct tag {
    bool isOpening;
    std::string name;
};

void print(const std::deque<tag> &deq){
    for(const auto &i : deq){
        std::cout << i.name << std::endl;
    }
}

int main() {
//    std::ifstream infile("a.html");
//    std::stringstream buffer;
//    buffer << infile.rdbuf();
//    std::string buf = std::move(buffer.str());
    std::deque<tag> deq;
    std::string buf = "<html>\n"
                      "\n"
                      "<body>\n"
                      "    <h1>Hello <b>world</b></h1>\n"
                      "</body>\n"
                      "\n"
                      "</html>";

    size_t pos = 0;
    while ((pos = buf.find('<', pos)) != std::string::npos) {
        bool isOpening = true;
        if(buf[pos+1] == '/'){
            isOpening = false;
            pos++;
        }
        std::string token = buf.substr(pos, buf.find('>', pos));
        deq.push_back({isOpening, token});
        pos++;
    }

    print(deq);
}

我想解析每个标签的名称以及它是否打开。但输出是：

<html
<body>
    <h
<h1>Hello <b>world</b>
<b>world</b></h1>
</body>

</ht
/b></h1>
</body>

</html>
/h1>
</body>

</html>
/body>

</html>
/html>

这当然不正确。那么我做错了什么？

原文

In this example:

#include <iostream>
#include <fstream>
#include <string>
#include <deque>
#include <sstream>


struct tag {
    bool isOpening;
    std::string name;
};

void print(const std::deque<tag> &deq){
    for(const auto &i : deq){
        std::cout << i.name << std::endl;
    }
}

int main() {
//    std::ifstream infile("a.html");
//    std::stringstream buffer;
//    buffer << infile.rdbuf();
//    std::string buf = std::move(buffer.str());
    std::deque<tag> deq;
    std::string buf = "<html>\n"
                      "\n"
                      "<body>\n"
                      "    <h1>Hello <b>world</b></h1>\n"
                      "</body>\n"
                      "\n"
                      "</html>";

    size_t pos = 0;
    while ((pos = buf.find('<', pos)) != std::string::npos) {
        bool isOpening = true;
        if(buf[pos+1] == '/'){
            isOpening = false;
            pos++;
        }
        std::string token = buf.substr(pos, buf.find('>', pos));
        deq.push_back({isOpening, token});
        pos++;
    }

    print(deq);
}

I would like to parse for each tag its name and whether it is opening or not. But the output is:

<html
<body>
    <h
<h1>Hello <b>world</b>
<b>world</b></h1>
</body>

</ht
/b></h1>
</body>

</html>
/h1>
</body>

</html>
/body>

</html>
/html>

which is of course not correct. So what am I doing wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

魂归处 2025-01-18 07:16:02

C++ 中的 substr 函数接受两个参数：子字符串的位置和长度，而不是开始和结束位置。因此，您基本上几乎所有子字符串都用于整个字符串，这就是您看到如此多重复输出的原因。您实际上只是想采用 substr 命令并将其更改为

 buf.substr(pos, buf.find('>', pos)-pos-1);

希望有效！

the substr function in c++ takes in two arguments: The position, and the length of the substring, and not the start and end positions. So you're basically having almost all of the substrings be for the entire string, which is why you see so much repeated output. You're really just wanting to take that substr command and change it to

 buf.substr(pos, buf.find('>', pos)-pos-1);

Hope that works!

回复收藏 0 原文

~没有更多了~

关于作者

意犹

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

如何从 C++ 中的 std::string 中提取 html 标签名称？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如何从 C++ 中的 std::string 中提取 html 标签名称？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。