URL和HTML Inspect提供不同的结果
当我复制Facebook页面的URL并创建一个美丽的对象时,它给了我一个文本,实际上并不是页面上的帖子。即
text = requests.get('https://www.facebook.com/toyota').text
soup = BeautifulSoup(text, 'lxml')
soup.get_text()
返回'\ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n toyota usa -ana sayfa \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n'。
但是,当我检查该Facebook页面并复制HTML元素并遵循类似的步骤时,我会得到我想要的。因此,
html_inspected = Copied HTML Element
soup = BeautifulSoup(html_inspected)
soup.get_text()
在我想要的Facebook页面上返回实际文本。我的问题是,我应该每次在页面中获取实际内容时检查并复制HTML吗?没有每次检查在Facebook页面上获取帖子和评论的快捷方式吗?
When I copy the url of a facebook page and create a BeautifulSoup object, it gives me a text that is not actually the posts on the pages. Namely
text = requests.get('https://www.facebook.com/toyota').text
soup = BeautifulSoup(text, 'lxml')
soup.get_text()
returns '\n\n\n\n\n\n\n\n\n\n\n\nToyota USA - Ana Sayfa\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'.
However, when I inspect that Facebook page and copy HTML element and follow similar steps, I get what I want. So
html_inspected = Copied HTML Element
soup = BeautifulSoup(html_inspected)
soup.get_text()
returns the actual text on the Facebook page I want. My question is am I supposed to inspect and copy the HTML every time I want to get the actual content in a page? Isn't there any shortcut for getting posts and comments on a Facebook page without inspecting every time?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如@hedgehog指出的那样,这可能是一个JavaScript问题。
最简单的解决方案是使用现成的 scraper library 用于任务:
或您可以使用硒:
As pointed out by @HedgeHog, this could be a JavaScript issue.
The simplest solution would be to use a ready-made scraper library for the task:
Alternatively, you could use Selenium: