使用 cURL 或 wget 与 http POST 来获取首页之后的搜索结果
编辑:我对我现在要寻找的东西有了更具体的想法,所以我正在重写整个问题。
我的总体目标是在网页 上的第一页(从脚本内)之后获得搜索结果http://www.ncbi.nlm.nih.gov/images。使用 Firefox 扩展“Tamper Data”,我检查了浏览器发送的请求,发现我能够修改 http POST 请求以获取结果的任何页面。
现在我想在脚本中执行此操作。我已经尝试过这两种方法
wget --post-data 'var1=foo&var2=bar&var3=...' http://www.ncbi.nlm.nih.gov/images
,并且
cURL --data 'var1=foo&var2=bar&var3=...' http://www.ncbi.nlm.nih.gov/images
尝试向 http:// /www.ncbi.nlm.nih.gov/images?term=INSERTSEARCHTERMHERE 并保存 cookie,然后在下次请求时加载 cookie,这次使用指示页码的 POST 数据。这不起作用。每当我请求第一个 URL 时,我都会得到图像搜索的主页,或者得到一个标题为“图像 - 遇到错误”的页面,但没有搜索结果。如果我请求第二个 URL(将 INSERTSEARCHTERMHERE 替换为我的实际搜索词),即使我发送了包含请求不同页面的变量的 POST 数据,我也始终会获得结果的第一页。看起来有两个——也许三个? - 表示页码的变量:
EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=14
EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.CurrPage=14
在篡改数据中,这始终是当前页面(当我请求新页面时我所在的页面):(
EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=1
是的,POST 数据中有两个具有相同名称的变量 - I不知道那是什么...??)
那么我如何在脚本中使用 cURL 或 wget 来访问搜索结果的所有页面?感谢您的帮助! (并感谢评论者帮助我澄清问题!)
附加信息: 有大量的 POST 字段,我将全部发送。我从篡改数据记录中复制了以下内容:
EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.SearchResourceList=图像&EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.Term=药物&EntrezSystem2.PEntrez.Images Db.Images_SearchBar.CurrDb=图像&EntrezSystem2.PEntrez.ImagesDb.Entrez_PageController.PreviousPageName=结果&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPane l.Images_DisplayBar.sPresentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ ResultsPanel.Images_DisplayBar.FileFormat=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastPresentation=docsum&EntrezSystem2.PEntr ez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.Presentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.PageSize=20&EntrezSys tem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastPageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.Format=&EntrezSy Stem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastFormat=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=14&EntrezSystem2.P Entrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.CurrPage=14&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_ResultsController.ResultCount=38231&Entrez System2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_ResultsController.RunLastQuery=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=1&Entr ezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPresentation2=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPageSize 2=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_MultiItemSupl.Discovery_SearchDetails.SearchDetailsTerm=药物%5BAll+Fields%5D&EntrezSystem2.PEntrez .ImagesDb.Images_ResultsPanel.HistoryDisplay.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.Db=图像&EntrezSystem2.PEntrez.DbConnector.LastDb=图像&Entr ezSystem2.PEntrez.DbConnector.Term=药物&EntrezSystem2.PEntrez.DbConnector.LastTabCmd=&EntrezSystem2.PEntrez.DbConnector.LastQueryKey=1&EntrezSystem2.PEntrez .DbConnector.IdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LastIdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LinkName=&EntrezSystem2.PEntrez.DbConnecto r.LinkReadableName=&EntrezSystem2.PEntrez.DbConnector.LinkSrcDb=&EntrezSystem2.PEntrez.DbConnector.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.TabCmd=& EntrezSystem2.PEntrez.DbConnector.QueryKey=&p%24a=EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage&p%24l=EntrezSystem2&p%24st=图像
EDIT: I've got a much more specific idea of what I'm looking for now so I'm re-writing the whole question.
My overall goal is to get to the search results after the first page (from within a script) on the webpage http://www.ncbi.nlm.nih.gov/images. Using the Firefox extension "Tamper Data", I have inspected the requests sent by my browser and found that I am able to modify the http POST request to get to any page of the results.
Now I would like to do this within a script. I've tried both
wget --post-data 'var1=foo&var2=bar&var3=...' http://www.ncbi.nlm.nih.gov/images
and
cURL --data 'var1=foo&var2=bar&var3=...' http://www.ncbi.nlm.nih.gov/images
and I've tried making the initial request to http://www.ncbi.nlm.nih.gov/images?term=INSERTSEARCHTERMHERE and saving a cookie, then loading the cookie the next time I request, this time with POST data indicating page number. It doesn't work. Anytime I request to the first URL I get the home page for image search or I get a page titled "Images - Error encountered" with no search results. If I request to the second URL (replacing INSERTSEARCHTERMHERE with my actual search term) I always get the first page of the results, even though I sent POST data including a variable asking for a different page. It seems there are two - maybe three? - variables denoting page number:
EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=14
EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.CurrPage=14
and in Tamper Data this is always the current page (the one I was on when I made the request for a new page):
EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=1
(Yes, there are two variables in the POST data with the same name - I don't know what that is about...??)
So how can I use cURL or wget within a script to get to all of the pages of the search results? Thanks for your help! (and thanks to the commenters for helping me clarify the question!)
Additional info:
There are a ton of POST fields, and I am sending all of them. I copied this out of what Tamper Data recorded:
EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.SearchResourceList=images&EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.Term=drug&EntrezSystem2.PEntrez.ImagesDb.Images_SearchBar.CurrDb=images&EntrezSystem2.PEntrez.ImagesDb.Entrez_PageController.PreviousPageName=results&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPresentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.FileFormat=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastPresentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.Presentation=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.PageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastPageSize=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.Format=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.LastFormat=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=14&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.CurrPage=14&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_ResultsController.ResultCount=38231&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_ResultsController.RunLastQuery=&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage=1&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPresentation2=docsum&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Images_DisplayBar.sPageSize2=20&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_MultiItemSupl.Discovery_SearchDetails.SearchDetailsTerm=drug%5BAll+Fields%5D&EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.HistoryDisplay.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.Db=images&EntrezSystem2.PEntrez.DbConnector.LastDb=images&EntrezSystem2.PEntrez.DbConnector.Term=drug&EntrezSystem2.PEntrez.DbConnector.LastTabCmd=&EntrezSystem2.PEntrez.DbConnector.LastQueryKey=1&EntrezSystem2.PEntrez.DbConnector.IdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LastIdsFromResult=&EntrezSystem2.PEntrez.DbConnector.LinkName=&EntrezSystem2.PEntrez.DbConnector.LinkReadableName=&EntrezSystem2.PEntrez.DbConnector.LinkSrcDb=&EntrezSystem2.PEntrez.DbConnector.Cmd=PageChanged&EntrezSystem2.PEntrez.DbConnector.TabCmd=&EntrezSystem2.PEntrez.DbConnector.QueryKey=&p%24a=EntrezSystem2.PEntrez.ImagesDb.Images_ResultsPanel.Entrez_Pager.cPage&p%24l=EntrezSystem2&p%24st=images
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论