go-colly 库能做什么？

发布于 2025-01-19 16:37:47 字数 266 浏览 3 评论 0原文

go-colly库可以抓取div标签下的所有HTML标签和文本内容吗？如果是这样，怎么办？我可以获取 div 标签下的所有文本。像这样：

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
            text = strings.TrimSpace(e.Text)
        })

但我不知道如何获取div标签下的HTML标签。

原文

Can the go-colly library crawl all HTML tags and text content under a div tag? If so, how? I can get all texts under a div tag. Like this:

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
            text = strings.TrimSpace(e.Text)
        })

But I dont'know how to get HTML tags under the div tag.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一指流沙 2025-01-26 16:37:47

如果您寻找 innerHTML ，它可以通过 DOM 并使用 Html 方法 (e.DOM.Html() ）。

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
    html, _ := e.DOM.Html()
    log.Println(html)
})

如果您在已创建的元素下寻找特殊标记，ForEach 可以用于此目的。第一个参数是选择器，第二个参数是回调函数。回调函数将迭代每个与选择器匹配并且也是 e 元素成员的元素。

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
    text := strings.TrimSpace(e.Text)
    log.Println(text)
    e.ForEach("div", func(_ int, el *colly.HTMLElement) {
        text := strings.TrimSpace(e.Text)
        log.Println(text)
    })
})

If you looking for innerHTML it is accessible by DOM and using Html method (e.DOM.Html()).

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
    html, _ := e.DOM.Html()
    log.Println(html)
})

If you looking for a special tag under the founded element, ForEach could use for this purpose. The first argument is the selector and the second parameter is the callback function. The callback function will iterate for each element that matches the selector and also is a member of the e element.

More information: https://pkg.go.dev/github.com/gocolly/[email protected]#HTMLElement.ForEach

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
    text := strings.TrimSpace(e.Text)
    log.Println(text)
    e.ForEach("div", func(_ int, el *colly.HTMLElement) {
        text := strings.TrimSpace(e.Text)
        log.Println(text)
    })
})

回复收藏 0 原文

~没有更多了~