当前位置：文江博客话题详情

抓取有问题的网站

发布于 2024-08-08 04:34:31 字数 451 浏览 8 评论 0原文

我正在尝试从网站上抓取一些信息，但在阅读相关页面时遇到问题。这些页面似乎首先发送基本设置，然后发送更详细的信息。我的下载尝试似乎只捕获了基本设置。到目前为止我已经尝试过 urllib 和 mechanize 。

Firefox 和 Chrome 可以毫无问题地显示页面，尽管当我查看页面源代码时看不到我想要的部分。

示例网址为 https://personal.vanguard.com/ us/funds/snapshot?FundId=0542&FundIntExt=INT

例如，我想要页面右下角的平均到期日和平均持续时间。问题不在于从页面中提取该信息，而是下载页面以便我可以提取信息。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橘味果▽酱 2024-08-15 04:34:31

该页面使用 JavaScript 来加载数据。 Firefox 和 Chrome 之所以能工作，是因为您启用了 JavaScript - 尝试禁用它，您将得到一个几乎是空的页面。

Python 本身无法做到这一点 - 最好的妥协是使用类似 Pamie。

回复收藏 0 原文

莫多说 2024-08-15 04:34:31

网站通过ajax加载数据。 Firebug 显示 ajax 调用。对于给定页面，数据从 https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542

查看原页面对应的javascript代码：

<script>populator = new Populator({parentId:
"profileForm:vanguardFundTabBox:tab0",execOnLoad:true,
 populatorUrl:"/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542",
inline:fals   e,type:"once"});
</script>

The website loads the data via ajax. Firebug shows the ajax calls. For the given page, the data is loaded from https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542

See the corresponding javascript code on the original page:

<script>populator = new Populator({parentId:
"profileForm:vanguardFundTabBox:tab0",execOnLoad:true,
 populatorUrl:"/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542",
inline:fals   e,type:"once"});
</script>

回复收藏 0 原文