如何在 Google Chrome 网上应用店中检索扩展程序的所有公开评论 - JSON &跨域问题
我有兴趣收集/抓取有关 Chrome 网上应用店中可用的流行扩展程序所获得的评论的数据。
特别是,我需要检索特定扩展的剩余评论总数,然后检索该插件公开可用的所有评论。我的问题如下:我无法编写标准的 PHP Curl scraper,因为我感兴趣的数据可以通过 json 请求获得,特别是我需要调用:
- https://chrome.google.com/reviews/components 的数量 评论 ('numRatings')
- https://chrome.google.com/reviews/json/search 对于评论(“评论”),
我尝试这样写:
<script src="http://code.jquery.com/jquery-latest.js"></script>
<script type="text/javascript">
function getReviews(extensionId, callback) {
var entities = [{'url' : 'http://chrome.google.com/extensions/permalink?id=' + extensionId}];
var param = {"searchSpecs":[{"requireComment":true,"entities": entities,"groups":["public_comment"],"matchExtraGroups":true,"sortBy":"quality","startIndex":10,"numResults":10,"includeNickNames":true}],"applicationId":94};
$.ajax({
type: 'POST',
url: 'https://chrome.google.com/reviews/json/search',
contentType: 'application/xml',
xhrFields: {withCredentials: true },
dataType: 'json',
data: 'req=' + JSON.stringify(param) + '&requestSource=widget'
}).success(callback);
}
</script>
<script type="text/javascript">
$(document).ready(getReviews('gighmmpiobklfepjocnamgkkbiglidom', function(reviews) { console.log(reviews); }));
</script>
我对 jQuery/JSON(-P) 不太热衷,上面的代码肯定是错误的。
我的问题如下:
- 如何绕过同域策略?我尝试了 YQL 但没有成功...
- 如何格式化我的 url/'data' 以仅检索数量 chrome.google.com/reviews/components 上的评论(“numRatings”)和 chrome.google.com/reviews/json/search 上的评论(“comments”),查找由其 id 标识的特定扩展程序,例如 gighmmpiobklfepjocnamgkkbiglidom?
我已经使用 PHP 完成了对流行的 Mozilla Addons 的这种抓取,并使用标准的curl/XPath 收集了我需要的数据。
感谢您的帮助!
I'm interested in gathering/scraping data about the reviews earned by popular extensions available in the Chrome Webstore.
In particular, I need to retrieve the number of total reviews left for a specific extension and then retrieve all the reviews publicly available for this addon. My problem is the following: I cannot write a standard PHP Curl scraper since the data I'm interested in is available through json requests, in particular, I need to call:
- https://chrome.google.com/reviews/components for the number of
reviews ('numRatings') - https://chrome.google.com/reviews/json/search
for the reviews ("comment")
I tried to write this:
<script src="http://code.jquery.com/jquery-latest.js"></script>
<script type="text/javascript">
function getReviews(extensionId, callback) {
var entities = [{'url' : 'http://chrome.google.com/extensions/permalink?id=' + extensionId}];
var param = {"searchSpecs":[{"requireComment":true,"entities": entities,"groups":["public_comment"],"matchExtraGroups":true,"sortBy":"quality","startIndex":10,"numResults":10,"includeNickNames":true}],"applicationId":94};
$.ajax({
type: 'POST',
url: 'https://chrome.google.com/reviews/json/search',
contentType: 'application/xml',
xhrFields: {withCredentials: true },
dataType: 'json',
data: 'req=' + JSON.stringify(param) + '&requestSource=widget'
}).success(callback);
}
</script>
<script type="text/javascript">
$(document).ready(getReviews('gighmmpiobklfepjocnamgkkbiglidom', function(reviews) { console.log(reviews); }));
</script>
I'm not very keen in jQuery/JSON(-P) and the code above is certainly wrong.
My questions are as follows:
- How to bypass the same-domain policy? I tried YQL without success...
- How to format my url/'data' to only retrieve the number of
reviews ('numRatings') on chrome.google.com/reviews/components and the reviews ('comments') on chrome.google.com/reviews/json/search for a specific extension identified by its id, e.g. gighmmpiobklfepjocnamgkkbiglidom?
I already accomplished this kind of scraping for popular Mozilla Addons using PHP and gathered the data I needed using a standard curl/XPath.
Thanks for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
1)最简单的方法是创建一个 Chrome 扩展;
2) 请参阅 https://github.com/xpressyoo/MyExtensions
以及
其中:
1) The easiest way would be to create a Chrome extension;
2) See https://github.com/xpressyoo/MyExtensions
and
where:
这是在 PHP 中使用并行 cURL 实现此目的的一种方法。该脚本会抓取 Chrome 网上应用店中存在的所有扩展程序(按受欢迎程度排名)并检索以下信息:
Here is a way of doing it in PHP with parallel cURL. This script scrapes all the extensions present in the Chrome webstore (ranked by popularity) and retrieves information such as: