当前位置：文江博客话题详情

网页爬虫 Python python爬虫

python爬虫，爬出来和源码不同

发布于 2022-09-12 04:50:55 字数 710 浏览 36 评论 0

求教，爬移民家园的网站，爬不到有效内容，这是为什么，怎么才能爬到具体的帖子内容？（附图是用下面的代码爬下来的内容）

import urllib.request
url = "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"
headers = {
    "User-Agent": "Mozilla/5.0(Windows NT 6.1; Win64; x64) AppleWebKit/537.36(KHTML, like  Gecko) Chrome/75.0.3770.142  Safari/537.36",
 "Referer": "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"
}
req = urllib.request.Request(url=url, headers=headers)
response = urllib.request.urlopen(req)
html = response.read().decode("utf-8")
print(html)

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

﹎☆浅夏丿初晴 2022-09-19 04:50:55

需要带上cookie才有数据，用一个seesion访问2次就行了

import requests
url = "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"
headers = {
    "User-Agent": "Mozilla/5.0(Windows NT 6.1; Win64; x64) AppleWebKit/537.36(KHTML, like  Gecko) Chrome/75.0.3770.142  Safari/537.36",
 "Referer": "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost",
#"Cookie": "agZD_b1dd_saltkey=s88c1OTO; agZD_b1dd_lastrequest=da9fBUNoIWsWCDoenEkJt1v2UMl1NFvuWruxtrWGzzWv%2FGdOzvGY",
}
s = requests.session()
content = s.get(url=url, headers=headers).content
content = s.get(url=url, headers=headers).content
print content.decode('gbk','ignore')

~没有更多了~

关于作者

暂无简介

文章

评论

26 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

十二

文章 0 评论 0

飞烟轻若梦

文章 0 评论 0

OPleyuhuo

文章 0 评论 0

wxb0109

文章 0 评论 0

旧城空念

文章 0 评论 0

-小熊_

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文