读取文件并仅提取某些部分

发布于 2024-10-04 01:50:53 字数 730 浏览 0 评论 0原文

ifstream toOpen;
openFile.open("sample.html", ios::in); 

if(toOpen.is_open()){
    while(!toOpen.eof()){
        getline(toOpen,line);
        if(line.find("href=") && !line.find(".pdf")){   
                start_pos = line.find("href"); 
        tempString = line.substr(start_pos+1); // i dont want the quote
            stop_pos = tempString .find("\"");
                string testResult = tempString .substr(start_pos, stop_pos);
        cout << testResult << endl;
        }
    }

    toOpen.close();
}

我想做的是提取“href”值。但我无法让它发挥作用。

编辑：

感谢托尼提示，我使用这个：

if(line.find("href=") != std::string::npos ){   
    // Process
}

它有效！

原文

ifstream toOpen;
openFile.open("sample.html", ios::in); 

if(toOpen.is_open()){
    while(!toOpen.eof()){
        getline(toOpen,line);
        if(line.find("href=") && !line.find(".pdf")){   
                start_pos = line.find("href"); 
        tempString = line.substr(start_pos+1); // i dont want the quote
            stop_pos = tempString .find("\"");
                string testResult = tempString .substr(start_pos, stop_pos);
        cout << testResult << endl;
        }
    }

    toOpen.close();
}

What I am trying to do, is to extrat the "href" value. But I cant get it works.

EDIT:

Thanks to Tony hint, I use this:

if(line.find("href=") != std::string::npos ){   
    // Process
}

it works!!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

巴黎夜雨 2024-10-11 01:50:53

我建议不要尝试像这样解析 HTML。除非您对源代码非常了解并且非常确定其格式，否则您所做的任何事情都可能会出现问题。 HTML 是一种丑陋语言，具有（几乎）自相矛盾的规范，（例如）规定不允许某些特定的事情 - 但随后又告诉您无论如何都需要如何解释它们。

更糟糕的是，几乎任何字符都可以（至少可能）以至少三种或四种不同方式中的任何一种进行编码，因此除非您首先扫描（并执行）正确的转换（以正确的顺序），否则您最终可能会丢失合法链接和/或包括“虚假”链接。

您可能需要查看此上一个问题的答案，以获取有关 HTML 解析器的建议使用。

回复收藏 0 原文

此生挚爱伱 2024-10-11 01:50:53

首先，您可能需要采用一些快捷方式在行上编写循环，以便使其更清晰。这是使用 C++ iostreams 的传统“一次读取行”循环：

#include <fstream>
#include <iostream>
#include <string>

int main ( int, char ** )
{
    std::ifstream file("sample.html");
    if ( !file.is_open() ) {
        std::cerr << "Failed to open file." << std::endl;
        return (EXIT_FAILURE);
    }
    for ( std::string line; (std::getline(file,line)); )
    {
        // process line.
    }
}

至于处理行的内部部分，存在几个问题。

它无法编译。我想这就是你所说的“我无法让它发挥作用”的意思。当提出问题时，您可能需要提供此类信息以获得良好的帮助。
变量名称 temp 和 tempString 等之间存在混淆。
string::find() 返回一个大的正整数来指示无效位置（< code>size_type 是无符号的），因此您将始终进入循环，除非从字符位置 0 开始找到匹配项，在这种情况下您可能确实想要进入循环。

这是sample.html的简单测试内容。

<html>
    <a href="foo.pdf"/>
</html>

将以下内容粘贴在循环中：

if ((line.find("href=") != std::string::npos) &&
    (line.find(".pdf" ) != std::string::npos))
{
    const std::size_t start_pos = line.find("href");
    std::string temp = line.substr(start_pos+6);
    const std::size_t stop_pos = temp.find("\"");
    std::string result = temp.substr(0, stop_pos);
    std::cout << "'" << result << "'" << std::endl;
}

我实际上得到了输出

'foo.pdf'

但是，正如 Jerry 指出的那样，您可能不想在生产环境中使用它。如果这是关于如何使用、和库的简单作业或练习，然后继续进行这样的程序。

As a start, you might want to take some shortcuts in the way you write the loop over lines in order to make it clearer. Here is the conventional "read line at a time" loop using C++ iostreams:

#include <fstream>
#include <iostream>
#include <string>

int main ( int, char ** )
{
    std::ifstream file("sample.html");
    if ( !file.is_open() ) {
        std::cerr << "Failed to open file." << std::endl;
        return (EXIT_FAILURE);
    }
    for ( std::string line; (std::getline(file,line)); )
    {
        // process line.
    }
}

As for the inner part the processes the line, there are several problems.

It doesn't compile. I suppose this is what you meant with "I cant get it works". When asking a question, this is the kind of information you might want to provide in order to get good help.
There is confusion between variable names temp and tempString etc.
string::find() returns a large positive integer to indicate invalid positions (the size_type is unsigned), so you will always enter the loop unless a match is found starting at character position 0, in which case you probably do want to enter the loop.

Here is a simple test content for sample.html.

<html>
    <a href="foo.pdf"/>
</html>

Sticking the following inside the loop:

if ((line.find("href=") != std::string::npos) &&
    (line.find(".pdf" ) != std::string::npos))
{
    const std::size_t start_pos = line.find("href");
    std::string temp = line.substr(start_pos+6);
    const std::size_t stop_pos = temp.find("\"");
    std::string result = temp.substr(0, stop_pos);
    std::cout << "'" << result << "'" << std::endl;
}

I actually get the output

'foo.pdf'

However, as Jerry pointed out, you might not want to use this in a production environment. If this is a simple homework or exercise on how to use the <string>, <iostream> and <fstream> libraries, then go ahead with such a procedure.

回复收藏 0 原文

~没有更多了~