用Java从Internet获取数据
我想用 java 为我的大学项目制作以下应用程序。我知道核心java。我想知道我应该为这个项目“专门”阅读什么,因为时间较少:
它将有一个界面来放置您的查询。该字符串将作为对互联网搜索引擎的查询,并在搜索引擎的帮助下找到数据(我们看到的第一个网页(这是我这次应用程序的数据。:)) ).
我不想显示数据。我只想要 HTML 文件或生成的网页的源代码。听起来像 Common Getaway 界面吗?我不知道这件事。
但我认为这是出于同样的目的。如果是这个的话。请指导我了解如何实现这一点。
无论是什么,请指定
- 问题1:我应该读什么?此时任何直接帮助都不是我的意图。我想自己实现它。
- 问题 2: 连接到互联网也需要一些 jnlp 知识。
例如。就像在谷歌上我们搜索一些东西一样,它会向我们显示网站的链接。我可以看到这个生成的网页的源代码。我只想让我的应用程序能够运行此页面。
编辑: 我不想只依赖谷歌或任何特定的网络服务器。我想通过我的申请来决定。
另请参阅我的问题 2。
当我发现我们有网站的条件条款时,我是否应该尝试制作我的爬虫。那么我的应用程序不会违反规则吗?嗯,这对我来说很重要。
I thought of making the following application for my college project in java. I know core java. I want to know what should i read "specifically" for this project as there is less time:
It will have an interface to put your query. This string would go as a query to internet search engines and with the help of search engine find the data (the first web page that we see (that is data for my application for this time. :) )).
I do not want to display the data. I just want the HTML file or the source code of the generated web page. Is it sounding like Common Getaway Interface? I do not know about this.
But i think it for the same purpose. If it is this. please guide me to know how to implement this.
Whatever please specify
- Problem 1 : What should i read ? Any direct help at this point is not my intention. I want to implement it myself.
- Problem 2 : Is connecting to internet requires some jnlp knowledge too.
for eg. as on google we search something it shows us the links of the websites. I can see the source code of this generated web page. I just want this page for my application to work on.
EDIT:
I do not want to rely on google only or any particular web server. I want to decide that by my application.
Please also refer to my problem 2.
As i discovered that we have Terms of Conditions for websites should i try to make my crawler. Would then my application not breaking the rules . Well its important for me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
阿什什,
这是我推荐的。
注意:Jason API 通常在 UI 端的 JavaScritps 中使用,但由于它非常容易且快速学习,因此我建议您这样做。您还可以探索(如果时间允许)基于 XML 的 API。
Ashish,
Here what I would recommend.
Note: Jason APIs are normally used from JavaScritps on the UI side but since its very easy and quick to learn, I suggested you that. You can also explore (if time permits) the XML based APIs also.
应该足以让你开始。
是的,请检查您是否没有违反网站的使用条款。搜索引擎并不真正喜欢您尝试通过程序访问它们。
许多公司(包括 Google)都有专门为此目的设计的 API。
Should be enough to get you started .
And yes , do check if you are not violating the usage terms of a website . Search Engines dont really like you trying to access them via a program .
Many , Including Google , has APIs specifically designed for this purpose.
你可以使用 HTMLUnit 做你想做的一切。它就像一个 Web 浏览器,但适用于 Java。在他们的网站上查看一些示例。
you can do everything you want using HTMLUnit. It´s like a web browser but for java. Check some examples at their website.
阅读 Java 教程中的“使用 URL”以获得了解 HTMLUnit、HttpClient 等可用库背后的内容
Read "Working with URL's" in the Java tutorial to get an idea what is behind the available libs like HTMLUnit, HttpClient, etc
您可能也不需要 HTML。 Google 使用此 API 将搜索结果作为网络服务提供。对于其他搜索引擎 GIYF 也是如此。您将获得 XML 形式的搜索结果,这对您来说更容易解析。另外,XML 不会包含任何不需要的数据,例如广告。
You probably dont need the HTML either. Google provide its search results as a web service using this API. Similarly for other search engine GIYF. You get the search results as XML, which is far more easier for you to parse. Plus the XML wont have any unwanted data like ads.