Java webMagic 如何爬取知乎回答？

发布于 2021-12-08 13:27:49 字数 2706 浏览 873 评论 4

用webmagic抓取知乎某个问题下的所有回答时候，每次只能获取前两条回答。

查了各种博客，试了各种方法，总是只返回2条回答，或者直接401。

o.a.h.impl.execchain.MainClientExec - Connection can be kept alive indefinitely
o.a.http.impl.auth.HttpAuthenticator - Authentication required
o.a.http.impl.auth.HttpAuthenticator - www.zhihu.com:443 requested authentication
o.a.http.impl.auth.HttpAuthenticator - Response contains no authentication challenges
o.a.h.c.p.ResponseProcessCookies - Cookie accepted [aliyungf_tc="AQAAAD1PxXQABgUA7CesO3+7/0/iFhJt", version:0, domain:www.zhihu.com, path:/, expiry:null]
o.a.h.i.c.PoolingHttpClientConnectionManager - Connection [id: 0][route: {s}->https://www.zhihu.com:443] can be kept alive indefinitely
o.a.h.i.c.PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://www.zhihu.com:443][total kept alive: 1; route allocated: 1 of 100; total allocated: 1 of 1]
u.c.webmagic.utils.CharsetUtils - Auto get charset: null
u.c.w.d.HttpClientDownloader - Charset autodetect failed, use UTF-8 as charset. Please specify charset in Site.setCharset()
u.c.w.d.HttpClientDownloader - downloading page success https://www.zhihu.com/api/v4/questions/29688243/answers?sort_by=default&include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cupvoted_followees%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%3F%28type%3Dbest_answerer%29%5D.topics&limit=3&offset=3
09:04:14.908 [pool-1-thread-1] INFO us.codecraft.webmagic.Spider - page status code error, page https://www.zhihu.com/api/v4/questions/29688243/answers?sort_by=default&include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cupvoted_followees%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%3F%28type%3Dbest_answerer%29%5D.topics&limit=3&offset=3 , code: 401

求各路大神指点迷津

分享到QQ

分享到微博