将 cookie 传递给 GET 请求(POST 之后)的问题
我在这个问题上被困了好几天了,由于尝试不同的组合但没有成功,我的眼睛开始受伤。问题是,我正在制作一个应用程序,它必须从互联网获取数据,解析它,然后将其显示给用户。我已经尝试了多种方法来做到这一点,并且使用 JSOUP 非常有帮助,尤其是在解析结果并从结果中获取数据时。
然而,有一个问题我无法解决。我尝试过使用常规 HTTPClient 和 JSOUP,但无法成功获取所需的数据。这是我的代码(JSOUP 版本):
public void bht_ht(Context c, int pozivni, int broj) throws IOException {
//this is the first connection, to get the cookies (I have tried the version without this method separate, but it's the same
Connection.Response resCookie = Jsoup.connect("http://www.bhtelecom.ba/imenik_telefon.html")
.method(Method.GET)
.execute();
String sessionId = resCookie.cookie("PHPSESSID");
String fetypo = resCookie.cookie("fe_typo_user");
//these two above are the cookies
//the POST request, with the data asked
Connection.Response res = Jsoup.connect("http://www.bhtelecom.ba/imenik_telefon.html?a=search")
.data("di", some_data)
.data("br", some_data)
.data("btnSearch","Tra%C5%BEi")
.cookie("PHPSESSID", sessionId)
.cookie("fe_typo_user", fetypo)
.method(Method.POST)
.execute();
Document dok = res.parse();
//So, here is the GET request for the site which contains the results, and this site is redirected to with HTTP 302 response after the POSt result
Document doc = Jsoup.connect("http://www.bhtelecom.ba/index.php?id=3226&")
.cookie("PHPSESSID", sessionId)
.cookie("fe_typo_user", fetypo)
.referrer("http://www.bhtelecom.ba/imenik_telefon.html")
.get();
Document doc = res2.parse();
Element elemenat = doc.select("div.boxtexter").get(0);
String ime = elemenat.text();
}
因此,最终结果将是一个包含返回数据的字符串。但是,无论我尝试什么,我都会得到“空白”页面及其解析的文本,并且我已经模拟了浏览器请求的所有内容。
以下是浏览器捕获的 POST 和 GET 原始标头: (发布)
> POST /imenik_telefon.html?a=search HTTP/1.1 Host: www.bhtelecom.ba
> Content-Length: 56 Cache-Control: max-age=0 Origin:
> http://www.bhtelecom.ba User-Agent: Mozilla/5.0 (Windows NT 6.1;
> WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202
> Safari/535.1 Content-Type: application/x-www-form-urlencoded Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Referer: http://www.bhtelecom.ba/index.php?id=3226& Accept-Encoding:
> gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset:
> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie:
> PHPSESSID=opavncj3317uidbt93t9bie980;
> fe_typo_user=332a76d0b1d4944bdbbcd28d63d62d75;
> __utma=206281024.1997742542.1319583563.1319583563.1319588786.2; __utmb=206281024.1.10.1319588786; __utmc=206281024; __utmz=206281024.1319583563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
>
> di=033&br=123456&_uqid=&_cdt=&_hsh=&btnSearch=Tra%C5%BEi
(获取)
> GET /index.php?id=3226& HTTP/1.1 Host: www.bhtelecom.ba Cache-Control:
> max-age=0 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64)
> AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1
> Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Referer: http://www.bhtelecom.ba/index.php?id=3226& Accept-Encoding:
> gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset:
> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie:
> PHPSESSID=opavncj3317uidbt93t9bie980;
> __utma=206281024.1997742542.1319583563.1319583563.1319588786.2; __utmb=206281024.1.10.1319588786; __utmc=206281024; __utmz=206281024.1319583563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fe_typo_user=07745dd2a36a23c64c2297026061a2c2
在这个 GET(其响应)中,找到了我需要的数据,但是使用参数、cookie 或我尝试过的所有内容的任何组合,我无法让它“认为”我做了一个POST 现在想要该数据。
这是我的代码版本,没有 JSOUP 解析器,但我也无法让它工作,尽管当我检查这些 cookie 时,它们没问题,POST 和 GET 相同,但没有成功。
DefaultHttpClient client = new DefaultHttpClient();
String postURL = "http://www.bhtelecom.ba/imenik_telefon.html?a=search";
HttpPost post = new HttpPost(postURL);
post.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, Boolean.FALSE);
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("di", "035"));
params.add(new BasicNameValuePair("br", "819443"));
params.add(new BasicNameValuePair("btnSearch","Tra%C5%BEi"));
UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
post.setEntity(ent);
HttpResponse responsePOST = client.execute(post);
HttpEntity resEntity = responsePOST.getEntity();
if (resEntity != null) {
//todo
}
//checking for cookies, they are OK
List<Cookie> cookies = client.getCookieStore().getCookies();
if (cookies.isEmpty()) {
Log.d(TAG, "no cookies");
} else {
for (int i = 0; i < cookies.size(); i++) {
Log.d(TAG, "cookies: " + cookies.get(i).toString());
}
}
resEntity.consumeContent();
HttpGet get = new HttpGet("http://www.bhtelecom.ba/index.php?id=3226&");
get.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, Boolean.FALSE);
HttpResponse responseGET = client.execute(get);
HttpEntity entityGET = responseGET.getEntity();
List<Cookie> cookiesGet = client.getCookieStore().getCookies();
if (cookies.isEmpty()) {
Log.d(TAG, "no cookies");
} else {
for (int i = 0; i < cookiesGet.size(); i++) {
Log.d(TAG, "cookies GET: " + cookiesGet.get(i).toString());
}
}
//a method to check the data, I pass the InputStream to it, and do the operations, I've tried "manually", and passing the InputStream to JSOUP, but without success in either case.
samplemethod(entityGET.getContent());
client.getConnectionManager().shutdown();
} catch (Exception e) {
e.printStackTrace();
}
因此,如果有人可以在我的设置中发现错误,或者找到一种方法来发出这两个请求,然后获取数据、HTTP 实体,然后我可以将其用作可爱的 JSOUP 解析器的输入(InputStream),那就会了太棒了。或者也许我已经了解了页面需要什么,并且我需要使用不同的参数提出请求,我将不胜感激。我使用 Wireshark 和 Charles 调试代理来了解要创建什么(都尝试过,仔细检查),并且只发现会话 id、fe_typo_user 和用于跟踪现场时间等的其他参数,并且我已经尝试过也传递它们,“_utma”“_utmb”......等等。
我还有一些其他方法,使用“更简单”的仅 POST 方法和数据响应,并且我已经成功地实现了这一点,但是这个网站的这个具体问题让我发疯。预先感谢您的帮助。
I am stuck on this issue for several days now, my eyes are starting to hurt from time spent trying different combinations, but without success. The thing is, I am making an app, which has to get data form the internet, parse it and then show it to the user. I've tried several methods for doing that, and using JSOUP was very helpful, especially with parsing and getting the data out of the results.
However, there is one issue which I can not resolve. I have tried with the regular HTTPClient, and with JSOUP but I can't successfully get the data I need. Here is my code (JSOUP version):
public void bht_ht(Context c, int pozivni, int broj) throws IOException {
//this is the first connection, to get the cookies (I have tried the version without this method separate, but it's the same
Connection.Response resCookie = Jsoup.connect("http://www.bhtelecom.ba/imenik_telefon.html")
.method(Method.GET)
.execute();
String sessionId = resCookie.cookie("PHPSESSID");
String fetypo = resCookie.cookie("fe_typo_user");
//these two above are the cookies
//the POST request, with the data asked
Connection.Response res = Jsoup.connect("http://www.bhtelecom.ba/imenik_telefon.html?a=search")
.data("di", some_data)
.data("br", some_data)
.data("btnSearch","Tra%C5%BEi")
.cookie("PHPSESSID", sessionId)
.cookie("fe_typo_user", fetypo)
.method(Method.POST)
.execute();
Document dok = res.parse();
//So, here is the GET request for the site which contains the results, and this site is redirected to with HTTP 302 response after the POSt result
Document doc = Jsoup.connect("http://www.bhtelecom.ba/index.php?id=3226&")
.cookie("PHPSESSID", sessionId)
.cookie("fe_typo_user", fetypo)
.referrer("http://www.bhtelecom.ba/imenik_telefon.html")
.get();
Document doc = res2.parse();
Element elemenat = doc.select("div.boxtexter").get(0);
String ime = elemenat.text();
}
So, the end result would be a string which contains the returned data. But, whatever I try I get the "blank" page and it's parsed text, and I've simulated everything which is requested by the browser.
Here are the POST and GET raw headers captured by the browser:
(post)
> POST /imenik_telefon.html?a=search HTTP/1.1 Host: www.bhtelecom.ba
> Content-Length: 56 Cache-Control: max-age=0 Origin:
> http://www.bhtelecom.ba User-Agent: Mozilla/5.0 (Windows NT 6.1;
> WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202
> Safari/535.1 Content-Type: application/x-www-form-urlencoded Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Referer: http://www.bhtelecom.ba/index.php?id=3226& Accept-Encoding:
> gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset:
> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie:
> PHPSESSID=opavncj3317uidbt93t9bie980;
> fe_typo_user=332a76d0b1d4944bdbbcd28d63d62d75;
> __utma=206281024.1997742542.1319583563.1319583563.1319588786.2; __utmb=206281024.1.10.1319588786; __utmc=206281024; __utmz=206281024.1319583563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
>
> di=033&br=123456&_uqid=&_cdt=&_hsh=&btnSearch=Tra%C5%BEi
(get)
> GET /index.php?id=3226& HTTP/1.1 Host: www.bhtelecom.ba Cache-Control:
> max-age=0 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64)
> AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1
> Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Referer: http://www.bhtelecom.ba/index.php?id=3226& Accept-Encoding:
> gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset:
> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie:
> PHPSESSID=opavncj3317uidbt93t9bie980;
> __utma=206281024.1997742542.1319583563.1319583563.1319588786.2; __utmb=206281024.1.10.1319588786; __utmc=206281024; __utmz=206281024.1319583563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); fe_typo_user=07745dd2a36a23c64c2297026061a2c2
In this GET, (its response), the data I need is located, but with any combination of parameters, cookies, or everything I tried, I couldn't get it to "think" that I made a POST and now want that data.
Here is the version of my code without JSOUP parser, but I can't get it to work either, although when I check those cookies, they are OK, same for POST and GET, but without success.
DefaultHttpClient client = new DefaultHttpClient();
String postURL = "http://www.bhtelecom.ba/imenik_telefon.html?a=search";
HttpPost post = new HttpPost(postURL);
post.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, Boolean.FALSE);
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("di", "035"));
params.add(new BasicNameValuePair("br", "819443"));
params.add(new BasicNameValuePair("btnSearch","Tra%C5%BEi"));
UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
post.setEntity(ent);
HttpResponse responsePOST = client.execute(post);
HttpEntity resEntity = responsePOST.getEntity();
if (resEntity != null) {
//todo
}
//checking for cookies, they are OK
List<Cookie> cookies = client.getCookieStore().getCookies();
if (cookies.isEmpty()) {
Log.d(TAG, "no cookies");
} else {
for (int i = 0; i < cookies.size(); i++) {
Log.d(TAG, "cookies: " + cookies.get(i).toString());
}
}
resEntity.consumeContent();
HttpGet get = new HttpGet("http://www.bhtelecom.ba/index.php?id=3226&");
get.getParams().setParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, Boolean.FALSE);
HttpResponse responseGET = client.execute(get);
HttpEntity entityGET = responseGET.getEntity();
List<Cookie> cookiesGet = client.getCookieStore().getCookies();
if (cookies.isEmpty()) {
Log.d(TAG, "no cookies");
} else {
for (int i = 0; i < cookiesGet.size(); i++) {
Log.d(TAG, "cookies GET: " + cookiesGet.get(i).toString());
}
}
//a method to check the data, I pass the InputStream to it, and do the operations, I've tried "manually", and passing the InputStream to JSOUP, but without success in either case.
samplemethod(entityGET.getContent());
client.getConnectionManager().shutdown();
} catch (Exception e) {
e.printStackTrace();
}
So, if anyone can find an error in my set up, or find me a way to make these two requests and then get the data, HTTP Entity, which I could then use as an input (InputStream) to lovely JSOUP parser, that would be amazing. Or maybe I got this whole thing about what does the page need, and I need to make my requests with different parameters, I would appreciate that. I used Wireshark and Charles Debugging Proxy to get the idea what to create (tried both, to double check), and found only that session id, fe_typo_user and some other parameters used for tracking the time on site and etc, and I've tried passing them too, "_utma" "_utmb" ... and so on.
I have some other methods, using "simpler", POST only methods with data in response, and I've successfully got that, but this specific issue with this site is driving me crazy. Thanks in advance for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
经过很多很多小时的尝试和跟踪传入/传出的数据包,我终于找到了解决方案。
问题出在“bug”上,或者说是 HTTPClient 的行为上。如果您向帖子添加参数,并且参数为空,具有“”值,则该参数不会随请求一起发送。我不知道这一点,并认为这些参数因为它们是空的,不会改变任何东西,并且在使用 JSOUP 进行操作时我没有将它们传递给请求。
如此
名胜古迹也是
。另一件事,由于此页面有 302 响应,并且 JSOUP 默认将 followRedirects 设置为“true”,所以我也必须将其设置为 false,因为该方法是 POST,并且“后续请求”必须是 GET,但 JSOUP 假设它仍然是 POST 并把事情搞砸了。
就是这样,希望有人会觉得这很有用:)
After many, many hours of trying things and tracking incoming/outgoing packets, I finally managed to find a solution.
The things was with the "bug", or the behavior of HTTPClient. If you add a parameter to a post, and a parameter is emty, has "" value, it is not sent with the request. I didn't know that, and thought that those parameters, since they are empty, won't change enything, and with doing stuff with JSOUP I didn't pass them to the requests.
So,
were the places of interest.
Another thing, since this page has 302 response, and JSOUP has followRedirects set to "true" as default, I had to make that false also because that method is POST, and the "follow up request" has to be GET, but JSOUP assumes it's still POST and messes things up.
So that's it, hope someone will find this useful :)