为什么我的程序只能获取部分网页源码?
我有一个程序可以提取网页的源代码并将其保存到 .txt 文件中。如果一次只完成一个,它就可以工作,但是当我突然浏览 100 个页面的循环时,每个页面源开始在 1/4 到 3/4 之间被切断(似乎是随意的)。关于为什么或如何解决这个问题有什么想法吗?
最初的想法是循环对于 java 来说太快了(我从 php 脚本运行这个 java),但后来认为从技术上讲它不应该进入下一个项目,直到当前条件完成为止。
这是我正在使用的代码:
import java.io.*;
import java.net.URL;
public class selectout {
public static BufferedReader read(String url) throws Exception{
return new BufferedReader(
new InputStreamReader(
new URL(url).openStream()));}
public static void main (String[] args) throws Exception{
BufferedReader reader = read(args[0]);
String line = reader.readLine();
String thenum = args[1];
FileWriter fstream = new FileWriter(thenum+".txt");
BufferedWriter out = new BufferedWriter(fstream);
while (line != null) {
out.write(line);
out.newLine();
//System.out.println(line);
line = reader.readLine(); }}
}
PHP 是一个基本的 mysql_query
while(fetch_assoc)
从数据库中获取 url,然后运行 system("java - jarcrawl.jar $url $filename");
然后,fopen
并fread
新文件,最后将源保存到数据库(之后>escaping_strings
和 这样的)。
I have a program to pull the source code of a webpage and save it to a .txt file. It works if done with just one at a time, but when I go through a loop of say 100 pages all of a sudden each page source starts to get cut off between 1/4 and 3/4 of the way through (seems to be arbitrary). Any ideas on why or how I would go about solving this?
Initial thoughts where that the loop is going too fast for the java (I am running this java from a php script) but then thought that it technically shouldn't be going to the next item until the current condition was finished anyway.
Here is the code I'm using:
import java.io.*;
import java.net.URL;
public class selectout {
public static BufferedReader read(String url) throws Exception{
return new BufferedReader(
new InputStreamReader(
new URL(url).openStream()));}
public static void main (String[] args) throws Exception{
BufferedReader reader = read(args[0]);
String line = reader.readLine();
String thenum = args[1];
FileWriter fstream = new FileWriter(thenum+".txt");
BufferedWriter out = new BufferedWriter(fstream);
while (line != null) {
out.write(line);
out.newLine();
//System.out.println(line);
line = reader.readLine(); }}
}
The PHP is a basic mysql_query
while(fetch_assoc)
grab the url from the database, then run system("java -jar crawl.jar $url $filename");
Then, it fopen
and fread
the new file, and finally saves the source to database (after escaping_strings
and such).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
完成每个文件的写入后,您需要关闭输出流。在 while 循环之后,调用 out.close();和 fstream.close();
You need to close your output streams after you finish writing each file. After your while loop, call out.close(); and fstream.close();
您必须刷新流并将其关闭。
You must flush the stream and close it.