如何检索谷歌页面
亲爱的大家,我现在正在使用网络工具
http://fiddesktop.cs.northwestern。 edu/mmp/scrape?url=
解析网页。
例如,我们可以解析纽约时报主页,我们这样做:
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://www.nytimes.com/pages/world/index.html
中我们浏览器的地址栏,它会为我们很好地解析内容。
然而,它对于谷歌页面却失败了。 例如,如果我想解析Google新闻标题,例如:
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://news.google.com/nwshp?hl=en&tab= wn
我总是会收到 500 内部服务器错误。
我确信这与谷歌网站有关,我想我们可能需要一些谷歌的 API,有人知道如何为谷歌页面解决这个问题吗? 非常感谢。
Dear all,I am now using a webtool
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=
to parse a webpage.
For example,we can parse newyorktimes homepage,we do:
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://www.nytimes.com/pages/world/index.html
in the address bar of our browser,it will parse things nicely for us.
However,it just fails for google pages.
For example,if I want to parse Google news headpage,like:
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://news.google.com/nwshp?hl=en&tab=wn
I will always get 500 Internal Server Error.
I am sure that is somthing to do with google website,I think probably we need some API for google,does anyone have any idea how to to sort this out for google pages?
Many thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据 google.com robots.txt 文件,明确要求您不要抓取其内容。 Google 不提供机器可读搜索结果的 API;他们希望通过小部件和嵌入策略来控制内容的呈现。
Per the google.com robots.txt file, you are explictly requested not to scrape their content. Google does not provide an API for machine-readable search results; they want to control the presentation of their content via widgets and embedding strategies.