为什么我的程序只能获取部分网页源码?

发布于 2024-12-02 11:37:22 字数 1190 浏览 0 评论 0原文

我有一个程序可以提取网页的源代码并将其保存到 .txt 文件中。如果一次只完成一个,它就可以工作,但是当我突然浏览 100 个页面的循环时,每个页面源开始在 1/4 到 3/4 之间被切断(似乎是随意的)。关于为什么或如何解决这个问题有什么想法吗?

最初的想法是循环对于 java 来说太快了(我从 php 脚本运行这个 java),但后来认为从技术上讲它不应该进入下一个项目,直到当前条件完成为止。

这是我正在使用的代码:

import java.io.*;
import java.net.URL;

public class selectout {

public static BufferedReader read(String url) throws Exception{
    return new BufferedReader(
        new InputStreamReader(
            new URL(url).openStream()));}

public static void main (String[] args) throws Exception{
    BufferedReader reader = read(args[0]);
    String line = reader.readLine();
    String thenum = args[1];
    FileWriter fstream = new FileWriter(thenum+".txt");
    BufferedWriter out = new BufferedWriter(fstream);
    while (line != null) {

          out.write(line);
          out.newLine();
        //System.out.println(line);
        line = reader.readLine(); }}
}

PHP 是一个基本的 mysql_query while(fetch_assoc) 从数据库中获取 url,然后运行 ​​system("java - jarcrawl.jar $url $filename");

然后,fopenfread新文件,最后将源保存到数据库(之后>escaping_strings 和 这样的)。

I have a program to pull the source code of a webpage and save it to a .txt file. It works if done with just one at a time, but when I go through a loop of say 100 pages all of a sudden each page source starts to get cut off between 1/4 and 3/4 of the way through (seems to be arbitrary). Any ideas on why or how I would go about solving this?

Initial thoughts where that the loop is going too fast for the java (I am running this java from a php script) but then thought that it technically shouldn't be going to the next item until the current condition was finished anyway.

Here is the code I'm using:

import java.io.*;
import java.net.URL;

public class selectout {

public static BufferedReader read(String url) throws Exception{
    return new BufferedReader(
        new InputStreamReader(
            new URL(url).openStream()));}

public static void main (String[] args) throws Exception{
    BufferedReader reader = read(args[0]);
    String line = reader.readLine();
    String thenum = args[1];
    FileWriter fstream = new FileWriter(thenum+".txt");
    BufferedWriter out = new BufferedWriter(fstream);
    while (line != null) {

          out.write(line);
          out.newLine();
        //System.out.println(line);
        line = reader.readLine(); }}
}

The PHP is a basic mysql_query while(fetch_assoc) grab the url from the database, then run system("java -jar crawl.jar $url $filename");

Then, it fopen and fread the new file, and finally saves the source to database (after escaping_strings and such).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

執念 2024-12-09 11:37:22

完成每个文件的写入后,您需要关闭输出流。在 while 循环之后,调用 out.close();和 fstream.close();

You need to close your output streams after you finish writing each file. After your while loop, call out.close(); and fstream.close();

不再见 2024-12-09 11:37:22

您必须刷新流并将其关闭。

finally{  //Error handling ignored in my example
    fstream.flush();  
    fstream.close();  
}

You must flush the stream and close it.

finally{  //Error handling ignored in my example
    fstream.flush();  
    fstream.close();  
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文