为什么 Jsoup 在 Java/Android 中抓取数据的方式不同
我一直在尝试从此 URL http://www.isleworthsyon.hounslow 抓取“学校通知”。 sch.uk/
我尝试在 Java 中抓取文本,然后使用 String.replaceAll 方法将所有“ ”(我不确定它是什么字符)替换为新行,效果非常好具有完美的结果,但是当我将相同的代码应用到 Android 时......它给了我不同的结果。
在 JAVA 中:
String URL = "http://www.isleworthsyon.hounslow.sch.uk/";
Document site = null;
try {
site = Jsoup.connect(URL).get();
} catch (IOException e) {
e.printStackTrace();
}
String HTML = site.html();
site.select("a").remove();
Elements news = site.select("div#np_91983-1");
String output = news.text();
String for_output_text = output.replaceAll(" ","\n\n");
System.out.println(for_output_text);
}
}
在 ANDROID 中:
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
final TextView text = (TextView)findViewById(R.id.text);
String URL = "http://www.isleworthsyon.hounslow.sch.uk/";
Document site = null;
try {
site = Jsoup.connect(URL).get();
} catch (IOException e) {
e.printStackTrace();
}
site.select("a").remove();
Elements news = site.select("div#np_91983-1");
String output = news.text();
String for_output_text = output.replaceAll(" ","\n\n");
text.setText(for_output_text);
}
两个输出文本不同,如下所示
http:// /dl.dropbox.com/u/35866688/Comparison.png
顺便说一句,这是我第一次进行网络抓取
编辑:经过实验,我的字符串从 news.text() 获取的内容在 java 和 android 中具有不同的间距。关于为什么会出现这种情况有什么建议以及是否有其他选择?
I have been trying to scrape the 'School Notices' from this URL http://www.isleworthsyon.hounslow.sch.uk/
I tried scraping the text in Java and then replaced all " " (which I am not sure what character it is,) with a new line using String.replaceAll method and it worked absolutely fine with perfect results, but when I apply the same code to Android..it gives me different results.
IN JAVA:
String URL = "http://www.isleworthsyon.hounslow.sch.uk/";
Document site = null;
try {
site = Jsoup.connect(URL).get();
} catch (IOException e) {
e.printStackTrace();
}
String HTML = site.html();
site.select("a").remove();
Elements news = site.select("div#np_91983-1");
String output = news.text();
String for_output_text = output.replaceAll(" ","\n\n");
System.out.println(for_output_text);
}
}
IN ANDROID:
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
final TextView text = (TextView)findViewById(R.id.text);
String URL = "http://www.isleworthsyon.hounslow.sch.uk/";
Document site = null;
try {
site = Jsoup.connect(URL).get();
} catch (IOException e) {
e.printStackTrace();
}
site.select("a").remove();
Elements news = site.select("div#np_91983-1");
String output = news.text();
String for_output_text = output.replaceAll(" ","\n\n");
text.setText(for_output_text);
}
The two output texts are different as you can see below
http://dl.dropbox.com/u/35866688/Comparison.png
btw this is my first go at web scraping
Edit: After experiments, The strings I get from news.text() have different spacings in java and android. Any suggestions as to why that is the case and are there any alternatives?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
最后!我找到了更好的解决方案;使用 for 循环并分别添加每个元素。
感谢所有提供帮助的人
Finally! I found a better Solution; to use for loop and add each element separately.
Thanks to everyone who helped