请帮忙:我怎样才能抓取这个网页?

发布于 2024-10-05 03:22:02 字数 1809 浏览 0 评论 0原文

有一个网站提供搜索服务。您输入一个数字,进行搜索,它就会返回结果。我想要做的是通过 Coldfusion 以编程方式运行该搜索,而不必访问该网站并手动搜索。

这就是我想要阅读/抓取的网页中的表单的样子(如查看页面源代码时所见):

<form id="frmNumID" name="frmNum" action="" method="post">

    <TABLE border=0 cellPadding=0 cellSpacing=0>
     <TR>
      <TD align="center">
         <label class="NumLabel" for="Num" ACCESSKEY="1">ENTER NUM:</label>
        <input class="NumInput" id="Num" name="inputNum"  onfocusin="select()"  title="Num Input" tabindex="1" type="text" value=""  size ="29" maxlength="17" >&nbsp;&nbsp;

      </TD>

      <TD align="center">
         <input class="NumInput" title="Submit Num" tabindex="2" type="image" src="/include/pics/SubmitBtn.jpg" value="submit" ACCESSKEY="2">
      </TD>
     </TR>
     </TABLE>

     <TABLE border=0 cellPadding=0 cellSpacing=0>
     <TR>    
      <TD colspan="2" align="center">

        <input type="radio" name="displayType" value="NONE"   Checked  />No Pictures&nbsp;&nbsp;                          
        <input type="radio" name="displayType" value="STUFF"    /> Other Stuff&nbsp;&nbsp;                
        <input type="radio" name="displayType" value="MORESTUFF"    /> More Other Stuff  
      </TD>
     </TR>

    </TABLE>
    <div id="NUMMsg"></div>

  </form>

我真正关心的唯一字段是 Num 输入字段。我想向该字段发布一个值,运行搜索,并在我的 Coldfusion 代码中获取结果。这是我到目前为止所拥有的:

<cfhttp url="http://www.someurl.com/"
        method="POST">
    <cfhttpparam name="Num" type="FormField" value="123456789123456" />
</cfhttp>
<cfdump var="#cfhttp.filecontent#" />

但是当我进入该页面时,转储只显示“连接失败”。我做错了什么?

There's a site that offers a search service. You enter a number, search, and it returns results. What I want to do is run that search programmatically through coldfusion instead of having to go to the site and search manually.

This is what the form in the web page I'd like to read/scrape looks like (as seen when viewing the page source):

<form id="frmNumID" name="frmNum" action="" method="post">

    <TABLE border=0 cellPadding=0 cellSpacing=0>
     <TR>
      <TD align="center">
         <label class="NumLabel" for="Num" ACCESSKEY="1">ENTER NUM:</label>
        <input class="NumInput" id="Num" name="inputNum"  onfocusin="select()"  title="Num Input" tabindex="1" type="text" value=""  size ="29" maxlength="17" >  

      </TD>

      <TD align="center">
         <input class="NumInput" title="Submit Num" tabindex="2" type="image" src="/include/pics/SubmitBtn.jpg" value="submit" ACCESSKEY="2">
      </TD>
     </TR>
     </TABLE>

     <TABLE border=0 cellPadding=0 cellSpacing=0>
     <TR>    
      <TD colspan="2" align="center">

        <input type="radio" name="displayType" value="NONE"   Checked  />No Pictures                            
        <input type="radio" name="displayType" value="STUFF"    /> Other Stuff                  
        <input type="radio" name="displayType" value="MORESTUFF"    /> More Other Stuff  
      </TD>
     </TR>

    </TABLE>
    <div id="NUMMsg"></div>

  </form>

The only field I really care about is the Num input field. I want to post a value to that field, run the search, and get the results in my coldfusion code. This is what I have so far:

<cfhttp url="http://www.someurl.com/"
        method="POST">
    <cfhttpparam name="Num" type="FormField" value="123456789123456" />
</cfhttp>
<cfdump var="#cfhttp.filecontent#" />

But when I go to the page the dump just says "Connection Failure". What am I doing wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

夜唯美灬不弃 2024-10-12 03:22:02

从您的示例代码中不清楚提交操作 URL 是什么?它是空白的。也许该网站使用 JavaScript 等来设置它?

您可能还需要发布 displayType 表单变量,因为它是一个复选框,并且可能需要提供一些值。

当心屏幕刮擦 - 这可能是维护的噩梦。如果无论如何要使用他们提供的官方 API,您应该这样做,因为一旦他们更改代码(发布 URL/标记等),您的代码很可能会崩溃。

It's not clear from your sample code what the submit action URL is? It's blank. Maybe the site sets it using JavaScript etc.?

You also probably need to post the displayType form variable as that is a check box and some value probably needs to be provided.

Beware screen scraping - it can be a maintenance nightmare. if there is anyway to use an official API they provide you should do it because as soon as they change their code (post URL / markup etc.) your code could very well break.

讽刺将军 2024-10-12 03:22:02

好的,该网站建议了一个解决方案:http://australiansearchengine.wordpress。 com/2009/09/28/cfhttp-connection-failure/

他们建议添加以下 cfhttpparam 标签:

<cfhttpparam type="header" name="accept-encoding" value="deflate;q=0">
<cfhttpparam type="header" name="te" value="deflate;q=0"> 

现在我不再遇到连接失败:)

Ok, this website suggested a solution: http://australiansearchengine.wordpress.com/2009/09/28/cfhttp-connection-failure/

They suggested adding the following cfhttpparam tags:

<cfhttpparam type="header" name="accept-encoding" value="deflate;q=0">
<cfhttpparam type="header" name="te" value="deflate;q=0"> 

Now I no longer get a connection failure :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文