抓取网页并检索 JavaScript 变量

发布于 2024-12-26 17:47:19 字数 297 浏览 1 评论 0原文

我需要抓取一个嵌入了内嵌 javascript 代码的 javascript 数组的网页，例如：

<script>
    var videos = new Array();
    videos[0] = 'http://myvideos.com/video1.mov'; 
    videos[1] = ....
    ....
</script>

处理此问题并最终得到这些视频 url 的 PHP 数组的最简单方法是什么？

编辑：所有视频的扩展名为 .mov。

原文

I need to scrape a web page that has a javascript array embeded in inline javascript code, such as:

<script>
    var videos = new Array();
    videos[0] = 'http://myvideos.com/video1.mov'; 
    videos[1] = ....
    ....
</script>

What's the easiest way to approach this and end up with a PHP array of these video urls?

Edit:
All videos are .mov extension.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

壹場煙雨 2025-01-02 17:47:19

这有点复杂，但它只会获得那些实际上是 videos[0] = 'http://myvideos.com/video1.mov'; 形式的

$tmp=str_replace(array("\r","\n"),'',$original,$matches);
$pattern='/\<script\>\s+var\ videos.*?((\s*videos\[\d+\]\ \=\ .http\:\/\/.*?\;\s*?)+)(.*?)\<\/script\>/';
$a=preg_match_all($pattern,$tmp,$matches);
unset($tmp);

if (!$a) die("no matches");

$pattern="/videos\[\d+\]\ \=\ /";
$matches=preg_split($pattern,$matches[1][0]);

$final=array();
while(sizeof($matches)>0) {
  $match=trim(array_shift($matches));
  if ($match=='') continue;
  $final[]=substr($match,1,-2);
}
unset($matches);

print_r($final);

链接这里的OP是简化版本：

$original=file_get_contents($url);
$pattern='/http\:\/\/.*?\.mov/';
$a=preg_match_all($pattern,$original,$matches);
if (!$a) die("no matches");
print_r($matches[0]);

This is a bit more complicated, but it will get only those links, that are really of the form videos[0] = 'http://myvideos.com/video1.mov';

$tmp=str_replace(array("\r","\n"),'',$original,$matches);
$pattern='/\<script\>\s+var\ videos.*?((\s*videos\[\d+\]\ \=\ .http\:\/\/.*?\;\s*?)+)(.*?)\<\/script\>/';
$a=preg_match_all($pattern,$tmp,$matches);
unset($tmp);

if (!$a) die("no matches");

$pattern="/videos\[\d+\]\ \=\ /";
$matches=preg_split($pattern,$matches[1][0]);

$final=array();
while(sizeof($matches)>0) {
  $match=trim(array_shift($matches));
  if ($match=='') continue;
  $final[]=substr($match,1,-2);
}
unset($matches);

print_r($final);

After feedback from the OP here is the simplified version:

$original=file_get_contents($url);
$pattern='/http\:\/\/.*?\.mov/';
$a=preg_match_all($pattern,$original,$matches);
if (!$a) die("no matches");
print_r($matches[0]);

回复收藏 0 原文

扶醉桌前 2025-01-02 17:47:19

您可以通过使用 file_get_contents 读取页面来抓取此内容，然后使用正则表达式检索 url。
这是我知道的最简单的方法，特别是如果您知道视频的文件扩展名。
示例：

<?php
$file = file_get_contents('http://google.com');
$pattern = '/http:\/\/([a-zA-Z0-9\-\.]+\.[fr|com]+)/i';
preg_match_all($pattern, $file, $matches);
var_dump($matches);

You can scrape this by reading the page with a file_get_contents then retrieve the urls with a regex.
This is the simplest way i know, especially if you know the file extensions for your videos.
Exemple:

<?php
$file = file_get_contents('http://google.com');
$pattern = '/http:\/\/([a-zA-Z0-9\-\.]+\.[fr|com]+)/i';
preg_match_all($pattern, $file, $matches);
var_dump($matches);

回复收藏 0 原文

~没有更多了~