在GO中刮擦网站时没有回复

发布于 2025-02-01 02:38:05 字数 1358 浏览 3 评论 0原文

我正在尝试使用Go和Colly来刮擦有关Zillow上一些列表的一些细节。这是我正在使用的脚本：

package main

import (
    "encoding/csv"
    "log"
    "os"
    "time"

    "github.com/gocolly/colly"
    "github.com/gocolly/colly/proxy"
)

func main() {
    // filename for data
    fName := "data.csv"
    // create a file
    file, err := os.Create(fName)
    // check for errors
    if err != nil {
        log.Fatalf("Could not create file, error : %q", err)
        return
    }
    // close file afterwards
    defer file.Close()

    // instantiate a csv writer
    writer := csv.NewWriter(file)
    // flush contents afterwards
    defer writer.Flush()

    // instantiate a collector
    c := colly.NewCollector(
        colly.AllowedDomains("https://www.zillow.com/austerlitz-ny/sold/"),
    )

    // point to the webpage structure you need to fetch
    c.OnHTML(".list-card-info", func(e *colly.HTMLElement) {
        // write the desired data into csv
        writer.Write([]string{
            e.ChildText("h1"),
            e.ChildText("a"),
        })
    })

    // show completion
    log.Printf("Scraping Finished\n")
    log.Println(c)
}

该脚本似乎没有错误，但也没有收集数据。终端将其记录为“提出的请求：0（0响应）|回调：onrequest：0，onhtml：1，onResponse：0，onerror：0”，data.csv也为空。

关于为什么会发生这种情况以及如何解决它的任何想法？

原文

I'm trying to use Go and Colly to scrape a few details about some listings on Zillow. Here's the script I'm using:

package main

import (
    "encoding/csv"
    "log"
    "os"
    "time"

    "github.com/gocolly/colly"
    "github.com/gocolly/colly/proxy"
)

func main() {
    // filename for data
    fName := "data.csv"
    // create a file
    file, err := os.Create(fName)
    // check for errors
    if err != nil {
        log.Fatalf("Could not create file, error : %q", err)
        return
    }
    // close file afterwards
    defer file.Close()

    // instantiate a csv writer
    writer := csv.NewWriter(file)
    // flush contents afterwards
    defer writer.Flush()

    // instantiate a collector
    c := colly.NewCollector(
        colly.AllowedDomains("https://www.zillow.com/austerlitz-ny/sold/"),
    )

    // point to the webpage structure you need to fetch
    c.OnHTML(".list-card-info", func(e *colly.HTMLElement) {
        // write the desired data into csv
        writer.Write([]string{
            e.ChildText("h1"),
            e.ChildText("a"),
        })
    })

    // show completion
    log.Printf("Scraping Finished\n")
    log.Println(c)
}

The script seems to run with no errors, but also collects no data. Terminal records it as "Requests made: 0 (0 responses) | Callbacks: OnRequest: 0, OnHTML: 1, OnResponse: 0, OnError: 0" and the data.csv is empty as well.

Any idea on why this is happening and how to resolve it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我爱人 2025-02-08 02:38:05

您应该先阅读Colly示例。波纹管是一个演示例子。仅在使用C.Visit时，Colly开始请求并获取解析数据。

func main() {
    c := colly.NewCollector()

    // Find and visit all links
    c.OnHTML("a", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL)
    })

    c.Visit("http://go-colly.org/") // start get data and the OnHTML start parse data get href
}

You should read colly example first. Bellow is a demo example. Only when using c.Visit, the colly start request and get data for parse.

func main() {
    c := colly.NewCollector()

    // Find and visit all links
    c.OnHTML("a", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL)
    })

    c.Visit("http://go-colly.org/") // start get data and the OnHTML start parse data get href
}

回复收藏 0 原文

~没有更多了~