如何从 C++ 中的 std::string 中提取 html 标签名称?
在此示例中:
#include <iostream>
#include <fstream>
#include <string>
#include <deque>
#include <sstream>
struct tag {
bool isOpening;
std::string name;
};
void print(const std::deque<tag> &deq){
for(const auto &i : deq){
std::cout << i.name << std::endl;
}
}
int main() {
// std::ifstream infile("a.html");
// std::stringstream buffer;
// buffer << infile.rdbuf();
// std::string buf = std::move(buffer.str());
std::deque<tag> deq;
std::string buf = "<html>\n"
"\n"
"<body>\n"
" <h1>Hello <b>world</b></h1>\n"
"</body>\n"
"\n"
"</html>";
size_t pos = 0;
while ((pos = buf.find('<', pos)) != std::string::npos) {
bool isOpening = true;
if(buf[pos+1] == '/'){
isOpening = false;
pos++;
}
std::string token = buf.substr(pos, buf.find('>', pos));
deq.push_back({isOpening, token});
pos++;
}
print(deq);
}
我想解析每个标签的名称以及它是否打开。但输出是:
<html
<body>
<h
<h1>Hello <b>world</b>
<b>world</b></h1>
</body>
</ht
/b></h1>
</body>
</html>
/h1>
</body>
</html>
/body>
</html>
/html>
这当然不正确。那么我做错了什么?
In this example:
#include <iostream>
#include <fstream>
#include <string>
#include <deque>
#include <sstream>
struct tag {
bool isOpening;
std::string name;
};
void print(const std::deque<tag> &deq){
for(const auto &i : deq){
std::cout << i.name << std::endl;
}
}
int main() {
// std::ifstream infile("a.html");
// std::stringstream buffer;
// buffer << infile.rdbuf();
// std::string buf = std::move(buffer.str());
std::deque<tag> deq;
std::string buf = "<html>\n"
"\n"
"<body>\n"
" <h1>Hello <b>world</b></h1>\n"
"</body>\n"
"\n"
"</html>";
size_t pos = 0;
while ((pos = buf.find('<', pos)) != std::string::npos) {
bool isOpening = true;
if(buf[pos+1] == '/'){
isOpening = false;
pos++;
}
std::string token = buf.substr(pos, buf.find('>', pos));
deq.push_back({isOpening, token});
pos++;
}
print(deq);
}
I would like to parse for each tag its name and whether it is opening or not. But the output is:
<html
<body>
<h
<h1>Hello <b>world</b>
<b>world</b></h1>
</body>
</ht
/b></h1>
</body>
</html>
/h1>
</body>
</html>
/body>
</html>
/html>
which is of course not correct. So what am I doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
C++ 中的
substr
函数接受两个参数:子字符串的位置和长度,而不是开始和结束位置。因此,您基本上几乎所有子字符串都用于整个字符串,这就是您看到如此多重复输出的原因。您实际上只是想采用substr
命令并将其更改为希望有效!
the
substr
function in c++ takes in two arguments: The position, and the length of the substring, and not the start and end positions. So you're basically having almost all of the substrings be for the entire string, which is why you see so much repeated output. You're really just wanting to take thatsubstr
command and change it toHope that works!