如何通过使用document.location.href的php curl重定向?
我正在尝试刮擦通常使用浏览器打开的网站。但是,每当我使用卷发打开链接时,我都会进入一个中间重定向页面,该页面显示“重定向...请等待。”
我的代码如下:
$url = "https://codeforces.com/problemset";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$result = curl_exec($ch); //returning the source code for the url.
echo $result;
而不是重述url curl_exec($ ch)
返回以下值的内容:
<html>
<body>Redirecting... Please, wait.<script type="text/javascript" src="/aes.min.js"></script>
<script>
function toNumbers(d) {
var e = [];
d.replace(/(..)/g, function(d) {
e.push(parseInt(d, 16))
});
return e
}
function toHex() {
for (var d = [], d = 1 == arguments.length && arguments[0].constructor == Array ? arguments[0] : arguments, e = "", f = 0; f < d.length; f++) e += (16 > d[f] ? "0" : "") + d[f].toString(16);
return e.toLowerCase()
}
var a = toNumbers("e9ee4b03c1d0822987185d27bca23378"),
b = toNumbers("188fafdbe0f87ef0fc2810d5b3e34705"),
c = toNumbers("d797a6b5b9d48f1ca8bcbddbe6654d10");
document.cookie = "RCPC=" + toHex(slowAES.decrypt(c, 2, a, b)) + "; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/";
document.location.href = "https://codeforces.com/problemset?tags=1000-1500&f0a28=1";
</script>
</body>
</html>
这将在页面中显示仅显示此
该链接仍然可以手动访问。 我该如何解决?
是否有某种方法可以重定向到document.location.href
使用卷曲?
I am trying to scrape a website that opens normally using the browser. However whenever I open the link using cURL , I get to an intermediary redirecting page that shows "Redirecting... Please, wait."
My code is as below:
$url = "https://codeforces.com/problemset";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$result = curl_exec($ch); //returning the source code for the url.
echo $result;
Instead of returing the contents of the url curl_exec($ch)
returns the below values:
<html>
<body>Redirecting... Please, wait.<script type="text/javascript" src="/aes.min.js"></script>
<script>
function toNumbers(d) {
var e = [];
d.replace(/(..)/g, function(d) {
e.push(parseInt(d, 16))
});
return e
}
function toHex() {
for (var d = [], d = 1 == arguments.length && arguments[0].constructor == Array ? arguments[0] : arguments, e = "", f = 0; f < d.length; f++) e += (16 > d[f] ? "0" : "") + d[f].toString(16);
return e.toLowerCase()
}
var a = toNumbers("e9ee4b03c1d0822987185d27bca23378"),
b = toNumbers("188fafdbe0f87ef0fc2810d5b3e34705"),
c = toNumbers("d797a6b5b9d48f1ca8bcbddbe6654d10");
document.cookie = "RCPC=" + toHex(slowAES.decrypt(c, 2, a, b)) + "; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/";
document.location.href = "https://codeforces.com/problemset?tags=1000-1500&f0a28=1";
</script>
</body>
</html>
this results in a page that shows simply this output in browser
The same code have worked before a few days. The link is still manually accessible .
How can I fix this?
Is there someway to get redirected to document.location.href
using cURL ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Curl无法执行任何JavaScript代码。 JavaScript代码在浏览器中执行。
同样,这种技术的实现用于停止不需要的网络刮擦。由于您试图刮擦的网站已经设置了它,因此将其刮擦可能是非法或有害的。
如果您仍然需要刮擦此类网站,则可以尝试硒或其他一些无头浏览器。或其他专用的Web取消工具。
cURL can't execute any JavaScript code. JavaScript code is executed within the browser.
Also, the implementation of this kind of technique is used to stop unwanted web scraping. Since the site you are trying to scrape has set it up, scraping it maybe illegal or harmful for the website.
If you still need to scrape such websites you can try selenium or some other headless browser. Or a different dedicated web scrapping tool.