如何向网站发送查询并解析结果?
我想用Java做一些开发。我希望能够访问一个网站,例如
www.chipotle.com
在右上角,他们有一个地方可以输入您的邮政编码,它会为您提供所有最近的位置。该程序只有一个空框供用户输入邮政编码,并且它将查询实际的 chipotle 服务器以检索最近的位置。我该如何做到这一点,以及我收到的数据如何存储?
这可能是一个关于我应该使用什么方法来解析数据的后续问题。
谢谢!
I want to do some development in Java. I'd like to be able to access a website, say for example
www.chipotle.com
On the top right, they have a place where you can enter in your zip code and it will give you all of the nearest locations. The program will just have an empty box for user input for their zip code, and it will query the actual chipotle server to retrieve the nearest locations. How do I do that, and also how is the data I receive stored?
This will probably be a followup question as to what methods I should use to parse the data.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,您需要知道执行查询所需的参数以及这些参数应提交到的 URL(表单的 action 属性)。这样,您的应用程序就必须使用您自己的参数(可能只有邮政编码)对 URL 执行 HTTP 请求。最后解析一下答案。
这可以使用标准 Java API 类来完成,但它不会非常健壮。更好的解决方案是 HttpClient。以下是一些示例。
First you need to know the parameters needed to execute the query and the URL which these parameters should be submitted to (the action attribute of the form). With that, your application will have to do an HTTP request to the URL, with your own parameters (possibly only the zip code). Finally parse the answer.
This can be done with standard Java API classes, but it won't be very robust. A better solution would be HttpClient. Here are some examples.
这很大程度上取决于网站实际返回的内容。
如果它返回静态 HTML,则应使用常规(严格)或宽松的 HTML 解析器。
如果它返回动态 HTML(即嵌入 Javascript 的 HTML),您可能需要使用一些东西来评估 Javascript 作为内容提取过程的一部分。
可能还有一个专为程序(例如您的程序)使用而设计的 Web API。此类 API 通常会以 XML 或 JSON 形式返回结果,这样您就不必从 HTML 文档中抓取结果。
在您继续之前,您应该检查该网站的服务条款。他们有没有提到你打算做什么?
许多网站不希望人们抓取他们的内容或为其服务提供包装。例如,如果他们从其网站上显示的广告中获得收入,那么您建议执行的操作可能会导致访问者转向其网站,并导致潜在或实际收入的损失。
如果您不尊重网站的服务条款,您可能会收到律师信……甚至更糟。此外,他们可能已经在使用技术手段让人们难以获取他们的服务。
It very much depends on what the website actually returns.
If it returns static HTML, use an regular (strict) or permissive HTML parser should be used.
If it returns dynamic HTML (i.e. HTML with embedded Javascript) you may need to use something that evaluates the Javascript as part of the content extraction process.
There may also be a web API designed for programs (like yours) to use. Such an API would typically return the results as XML or JSON so that you don't have to scrape the results out of an HTML document.
Before you go any further you should check the Terms of Service for the site. Do they say anything about what you are proposing to do?
A lot of sites DO NOT WANT people to scrape their content or provide wrappers for their services. For instance, if they get income from ads shown on their site, what you are proposing to do could result in a diversion of visitors to their site and a resulting loss of potential or actual income.
If you don't respect a website's ToS, you could be on the receiving end of lawyers letters ... or worse. In addition, they could already be using technical means to make life difficult for people to scrape their service.