JSoup 跳过元素 Android
JSoup 似乎跳过了我的 HTML 字符串中的一些元素。我 100% 肯定所有内容都在 HTML 字符串中,但当我选择要解析的元素时,JSoup 仅读取其中一些元素,或者根本不读取。但我知道它们存在。这是我的代码:谢谢:
public void parseDoc() {
final HttpParams params = new BasicHttpParams();
HttpClientParams.setRedirecting(params, true);
HttpClient httpclient = new DefaultHttpClient();
HttpPost httppost = new HttpPost(
"https://secure.groupfusion.net/processlogin.php");
String HTML = "";
try {
List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>(3);
nameValuePairs.add(new BasicNameValuePair("referral_page",
"/modules/gradebook/ui/gradebook.phtml?type=student_view"));
nameValuePairs.add(new BasicNameValuePair("currDomain",
"beardenhs.knoxschools.org"));
nameValuePairs.add(new BasicNameValuePair("username", username
.getText().toString()));
nameValuePairs.add(new BasicNameValuePair("password", password
.getText().toString()));
httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs));
HttpResponse response = httpclient.execute(httppost);
HTML = EntityUtils.toString(response.getEntity());
Document doc = Jsoup.parse(HTML);
Element link = doc.select("a").first();
String linkHref = link.attr("href");
HttpGet request = new HttpGet();
try {
request.setURI(new URI(linkHref));
} catch (URISyntaxException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
response = httpclient.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(
new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
str.append(line);
}
in.close();
HTML = str.toString();
doc = Jsoup.parse(HTML);
Elements divs = doc.select("div.yuiTop");
for (Element d: divs) {
sting.append(d.text());
sting.append("\n");
}
} catch (ClientProtocolException e) {
} catch (IOException e) {
}
}
JSoup seems to be skipping some elements in my HTML string. I am 100% positive everything is in the HTML String, but JSoup is only reading some of the elements when I select them to be parsed, or none at all. But I know they exist. Here is my code: Thanks:
public void parseDoc() {
final HttpParams params = new BasicHttpParams();
HttpClientParams.setRedirecting(params, true);
HttpClient httpclient = new DefaultHttpClient();
HttpPost httppost = new HttpPost(
"https://secure.groupfusion.net/processlogin.php");
String HTML = "";
try {
List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>(3);
nameValuePairs.add(new BasicNameValuePair("referral_page",
"/modules/gradebook/ui/gradebook.phtml?type=student_view"));
nameValuePairs.add(new BasicNameValuePair("currDomain",
"beardenhs.knoxschools.org"));
nameValuePairs.add(new BasicNameValuePair("username", username
.getText().toString()));
nameValuePairs.add(new BasicNameValuePair("password", password
.getText().toString()));
httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs));
HttpResponse response = httpclient.execute(httppost);
HTML = EntityUtils.toString(response.getEntity());
Document doc = Jsoup.parse(HTML);
Element link = doc.select("a").first();
String linkHref = link.attr("href");
HttpGet request = new HttpGet();
try {
request.setURI(new URI(linkHref));
} catch (URISyntaxException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
response = httpclient.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(
new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
str.append(line);
}
in.close();
HTML = str.toString();
doc = Jsoup.parse(HTML);
Elements divs = doc.select("div.yuiTop");
for (Element d: divs) {
sting.append(d.text());
sting.append("\n");
}
} catch (ClientProtocolException e) {
} catch (IOException e) {
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这里有一些奇怪的代码...所以我认为这是在进行查询,结果将返回一个超链接列表,并且您正在屏幕抓取第一个超链接的结果,然后您正在尝试加载内容第二个链接的?那么,您确定服务器返回有效的超链接吗?尝试在浏览器中加载该页面。
如果它有效,那么我不确定问题是什么,但为什么不使用 WebView.loadUrl() 并让浏览器组件来处理它呢?
Kind of weird code here... so I gather that this is making a query, which will return a list of hyperlinks as a result, and you are screen-scraping the result for the first hyperlink, then you are trying to load the content of that second link? Well, are you sure the server is returning a valid hyperlink? Try loading the page in your browser.
If it's valid then I'm not sure what the issue is, but why wouldn't you use
WebView.loadUrl()
, and let the browser component take care of it?