为什么 Jsoup 在 Java/Android 中抓取数据的方式不同
我一直在尝试从此 URL http://www.isleworthsyon.hounslow 抓取“学校通知”。 sch.uk/
我尝试在 Java 中抓取文本,然后使用 String.replaceAll 方法将所有“ ”(我不确定它是什么字符)替换为新行,效果非常好具有完美的结果,但是当我将相同的代码应用到 Android 时......它给了我不同的结果。
在 JAVA 中:
String URL = "http://www.isleworthsyon.hounslow.sch.uk/";
Document site = null;
try {
site = Jsoup.connect(URL).get();
} catch (IOException e) {
e.printStackTrace();
}
String HTML = site.html();
site.select("a").remove();
Elements news = site.select("div#np_91983-1");
String output = news.text();
String for_output_text = output.replaceAll(" ","\n\n");
System.out.println(for_output_text);
}
}
在 ANDROID 中:
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
final TextView text = (TextView)findViewById(R.id.text);
String URL = "http://www.isleworthsyon.hounslow.sch.uk/";
Document site = null;
try {
site = Jsoup.connect(URL).get();
} catch (IOException e) {
e.printStackTrace();
}
site.select("a").remove();
Elements news = site.select("div#np_91983-1");
String output = news.text();
String for_output_text = output.replaceAll(" ","\n\n");
text.setText(for_output_text);
}
两个输出文本不同,如下所示
http:// /dl.dropbox.com/u/35866688/Comparison.png
顺便说一句,这是我第一次进行网络抓取
编辑:经过实验,我的字符串从 news.text() 获取的内容在 java 和 android 中具有不同的间距。关于为什么会出现这种情况有什么建议以及是否有其他选择?
I have been trying to scrape the 'School Notices' from this URL http://www.isleworthsyon.hounslow.sch.uk/
I tried scraping the text in Java and then replaced all " " (which I am not sure what character it is,) with a new line using String.replaceAll method and it worked absolutely fine with perfect results, but when I apply the same code to Android..it gives me different results.
IN JAVA:
String URL = "http://www.isleworthsyon.hounslow.sch.uk/";
Document site = null;
try {
site = Jsoup.connect(URL).get();
} catch (IOException e) {
e.printStackTrace();
}
String HTML = site.html();
site.select("a").remove();
Elements news = site.select("div#np_91983-1");
String output = news.text();
String for_output_text = output.replaceAll(" ","\n\n");
System.out.println(for_output_text);
}
}
IN ANDROID:
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
final TextView text = (TextView)findViewById(R.id.text);
String URL = "http://www.isleworthsyon.hounslow.sch.uk/";
Document site = null;
try {
site = Jsoup.connect(URL).get();
} catch (IOException e) {
e.printStackTrace();
}
site.select("a").remove();
Elements news = site.select("div#np_91983-1");
String output = news.text();
String for_output_text = output.replaceAll(" ","\n\n");
text.setText(for_output_text);
}
The two output texts are different as you can see below
http://dl.dropbox.com/u/35866688/Comparison.png
btw this is my first go at web scraping
Edit: After experiments, The strings I get from news.text() have different spacings in java and android. Any suggestions as to why that is the case and are there any alternatives?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
最后!我找到了更好的解决方案;使用 for 循环并分别添加每个元素。
感谢所有提供帮助的人
Finally! I found a better Solution; to use for loop and add each element separately.
Thanks to everyone who helped