使用 HttpClient 模拟 HTTP POST 时出现问题

发布于 2024-10-10 09:24:43 字数 4779 浏览 3 评论 0原文

我正在尝试使用 HttpClient 以编程方式将 HTTP Post 请求发送到 http://ojp。 nationalrail.co.uk/en/s/planjourney/query 但它不喜欢我发送的请求。我从 Chrome 浏览器发送的内容中复制了标头和正文,因此它们是相同的,但它与我发送的内容不同,因为 HTML 提到存在错误。

<div class="padding">
                    <h1 class="sifr"><strong>Sorry</strong>, something went wrong</h1>
                    <div class="error-message">
                        <div class="error-message-padding">
                            <h2>There is a problem with the page you are trying to access.</h2>
                            <p>It is possible that it was either moved, it doesn't exist or we are experiencing some technical difficulties.</p>
                            <p>We are sorry for the inconvenience.</p>
                        </div> 
                    </div>
                </div>

这是我使用 HttpClient 的 Java 程序:

package com.tixsnif;

import org.apache.http.*;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;

import java.io.*;
import java.util.*;
import java.util.zip.GZIPInputStream;

public class WebScrapingTesting {

public static void main(String[] args) throws Exception {
    String target = "http://ojp.nationalrail.co.uk/en/s/planjourney/query";

    HttpClient client = new DefaultHttpClient();

    HttpPost httpPost = new HttpPost(target);
    BasicNameValuePair[] params = {
            new BasicNameValuePair("jpState", "single"),
            new BasicNameValuePair("commandName", "journeyPlannerCommand"),
            new BasicNameValuePair("from.searchTerm", "Basingstoke"),
            new BasicNameValuePair("to.searchTerm", "Reading"),
            new BasicNameValuePair("timeOfOutwardJourney.arrivalOrDeparture", "DEPART"),
            new BasicNameValuePair("timeOfOutwardJourney.monthDay", "Today"),
            new BasicNameValuePair("timeOfOutwardJourney.hour", "10"),
            new BasicNameValuePair("timeOfOutwardJourney.minute", "15"),
            new BasicNameValuePair("timeOfReturnJourney.arrivalOrDeparture", "DEPART"),
            new BasicNameValuePair("timeOfReturnJourney.monthDay", "Today"),
            new BasicNameValuePair("timeOfReturnJourney.hour", "18"),
            new BasicNameValuePair("timeOfReturnJourney.minute", "15"),
            new BasicNameValuePair("_includeOvertakenTrains", "on"),
            new BasicNameValuePair("viaMode", "VIA"),
            new BasicNameValuePair("via.searchTerm", "Station name / code"),
            new BasicNameValuePair("offSetOption", "0"),
            new BasicNameValuePair("_reduceTransfers", "on"),
            new BasicNameValuePair("operatorMode", "SHOW"),
            new BasicNameValuePair("operator.code", ""),
            new BasicNameValuePair("_lookForSleeper", "on"),
            new BasicNameValuePair("_directTrains", "on")};

    httpPost.setHeader("Host", "ojp.nationalrail.co.uk");
    httpPost.setHeader("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.231 Safari/534.10");
    httpPost.setHeader("Accept-Encoding", "gzip,deflate,sdch");
    httpPost.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,**/*//*;q=0.8");
    httpPost.setHeader("Accept-Language", "en-us,en;q=0.8");
    httpPost.setHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
    httpPost.setHeader("Origin", "http://www.nationalrail.co.uk/");
    httpPost.setHeader("Referer", "http://www.nationalrail.co.uk/");
    httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded");
    httpPost.setHeader("Cookie", "JSESSIONID=B2A3419B79C5D999CA4806B459675CCD.app201; Path=/");
    UrlEncodedFormEntity urlEncodedFormEntity = new UrlEncodedFormEntity(Arrays.asList(params));
    urlEncodedFormEntity.setContentEncoding(HTTP.UTF_8);
    httpPost.setEntity(urlEncodedFormEntity);
    HttpResponse response = client.execute(httpPost);

    InputStream input = response.getEntity().getContent();
    GZIPInputStream gzip = new GZIPInputStream(input);
    InputStreamReader isr = new InputStreamReader(gzip);
    BufferedReader br = new BufferedReader(isr);

    String line = null;
    while((line = br.readLine()) != null) {
        System.out.printf("\n%s", line);
    }

    client.getConnectionManager().shutdown();
}
}

如果 JSESSION ID 过期,我会更新它,但似乎还有另一个我看不到的问题。我错过了一些相当明显的东西吗?

I am trying to programatically send a HTTP Post request using HttpClient to http://ojp.nationalrail.co.uk/en/s/planjourney/query but it is not liking the request I send it. I copied the headers and body from what Chrome browser sends so it is identical but it doesn't like what I send as the HTML mentions there's an error.

<div class="padding">
                    <h1 class="sifr"><strong>Sorry</strong>, something went wrong</h1>
                    <div class="error-message">
                        <div class="error-message-padding">
                            <h2>There is a problem with the page you are trying to access.</h2>
                            <p>It is possible that it was either moved, it doesn't exist or we are experiencing some technical difficulties.</p>
                            <p>We are sorry for the inconvenience.</p>
                        </div> 
                    </div>
                </div>

Here is my Java program which uses HttpClient:

package com.tixsnif;

import org.apache.http.*;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;

import java.io.*;
import java.util.*;
import java.util.zip.GZIPInputStream;

public class WebScrapingTesting {

public static void main(String[] args) throws Exception {
    String target = "http://ojp.nationalrail.co.uk/en/s/planjourney/query";

    HttpClient client = new DefaultHttpClient();

    HttpPost httpPost = new HttpPost(target);
    BasicNameValuePair[] params = {
            new BasicNameValuePair("jpState", "single"),
            new BasicNameValuePair("commandName", "journeyPlannerCommand"),
            new BasicNameValuePair("from.searchTerm", "Basingstoke"),
            new BasicNameValuePair("to.searchTerm", "Reading"),
            new BasicNameValuePair("timeOfOutwardJourney.arrivalOrDeparture", "DEPART"),
            new BasicNameValuePair("timeOfOutwardJourney.monthDay", "Today"),
            new BasicNameValuePair("timeOfOutwardJourney.hour", "10"),
            new BasicNameValuePair("timeOfOutwardJourney.minute", "15"),
            new BasicNameValuePair("timeOfReturnJourney.arrivalOrDeparture", "DEPART"),
            new BasicNameValuePair("timeOfReturnJourney.monthDay", "Today"),
            new BasicNameValuePair("timeOfReturnJourney.hour", "18"),
            new BasicNameValuePair("timeOfReturnJourney.minute", "15"),
            new BasicNameValuePair("_includeOvertakenTrains", "on"),
            new BasicNameValuePair("viaMode", "VIA"),
            new BasicNameValuePair("via.searchTerm", "Station name / code"),
            new BasicNameValuePair("offSetOption", "0"),
            new BasicNameValuePair("_reduceTransfers", "on"),
            new BasicNameValuePair("operatorMode", "SHOW"),
            new BasicNameValuePair("operator.code", ""),
            new BasicNameValuePair("_lookForSleeper", "on"),
            new BasicNameValuePair("_directTrains", "on")};

    httpPost.setHeader("Host", "ojp.nationalrail.co.uk");
    httpPost.setHeader("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.231 Safari/534.10");
    httpPost.setHeader("Accept-Encoding", "gzip,deflate,sdch");
    httpPost.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,**/*//*;q=0.8");
    httpPost.setHeader("Accept-Language", "en-us,en;q=0.8");
    httpPost.setHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
    httpPost.setHeader("Origin", "http://www.nationalrail.co.uk/");
    httpPost.setHeader("Referer", "http://www.nationalrail.co.uk/");
    httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded");
    httpPost.setHeader("Cookie", "JSESSIONID=B2A3419B79C5D999CA4806B459675CCD.app201; Path=/");
    UrlEncodedFormEntity urlEncodedFormEntity = new UrlEncodedFormEntity(Arrays.asList(params));
    urlEncodedFormEntity.setContentEncoding(HTTP.UTF_8);
    httpPost.setEntity(urlEncodedFormEntity);
    HttpResponse response = client.execute(httpPost);

    InputStream input = response.getEntity().getContent();
    GZIPInputStream gzip = new GZIPInputStream(input);
    InputStreamReader isr = new InputStreamReader(gzip);
    BufferedReader br = new BufferedReader(isr);

    String line = null;
    while((line = br.readLine()) != null) {
        System.out.printf("\n%s", line);
    }

    client.getConnectionManager().shutdown();
}
}

I keep the JSESSION ID updated if it expires but there seems to be another problem that I cannot see. Am I missing something rather obvious?

He

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夜还是长夜 2024-10-17 09:24:43

对于每个人都需要这篇文章的答案,解决方案是使用 HttpContext 自动管理 cookie:

  HttpContext context=new BasicHttpContext();
  CookieStore cookiestore=new BasicCookieStore();
  context.setAttribute(ClientContext.COOKIE_STORE,cookiestore);

并在发出 http 请求时传递它:

   HttpResponse response = client.execute(httpPost,context);

每次发出请求时,cookie 存储都会自动更新!简单的!

For everyone need an an answer to this post the solution is to use HttpContext to automatically manage cookies:

  HttpContext context=new BasicHttpContext();
  CookieStore cookiestore=new BasicCookieStore();
  context.setAttribute(ClientContext.COOKIE_STORE,cookiestore);

and pass it when you make an http request:

   HttpResponse response = client.execute(httpPost,context);

everytime you make a request your cookies store will automatically update! Easy!

心如荒岛 2024-10-17 09:24:43

访问上面的链接并查看html源,看起来目标路径应该是/en/s/planjourney/plan

Visiting the link above and viewing the html source, it looks like the target path should be /en/s/planjourney/plan.

画尸师 2024-10-17 09:24:43

尝试使用 HTTP Client 4.1 并在 HTTP Client 上设置重定向策略来处理 302 状态代码(暂时移动)

Try using the HTTP Client 4.1 and set a redirect strategy on the HTTP Client for handling 302 status code (Moved Temporarily)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文