simplexml_load_file() 有多快?

发布于 2024-08-06 18:31:57 字数 550 浏览 6 评论 0原文

我正在通过 last.fm 的 API 为我的混搭获取大量用户数据。我每周都会这样做,因为我必须收集听力数据。

我通过他们的 REST API 和 XML 获取数据:更具体地说是 simplexml_load_file()

剧本花费的时间长得离谱。对于大约 2 300 个用户,该脚本需要 30 分钟才能仅获取艺术家姓名。我现在必须修复它,否则我的托管公司将关闭我。我已经排除了所有其他选项,正是 XML 拖慢了脚本速度。

我现在必须弄清楚last.fm是否有一个缓慢的API(或者在没有告诉我们的情况下限制调用),或者PHP的simplexml是否实际上相当慢。

我意识到的一件事是 XML 请求获取的数据比我需要的多得多,但我无法通过 API 限制它(即只给我 3 个频段的信息,而不是 70 个频段)。但“大”XML 文件只有 20kb 左右。难道是因为这样,脚本速度变慢了吗?必须为 2300 个用户中的每一个用户加载 20kb 到一个对象中?

没有意义,它可能是...我只需要确认它可能是last.fm 的缓慢API。或者是吗?

您还能提供其他帮助吗?

I'm fetching lots of user data via last.fm's API for my mashup. I do this every week as I have to collect listening data.

I fetch the data through their REST API and XML: more specifically simplexml_load_file().

The script is taking ridiculously long. For about 2 300 users, the script takes 30min to fetch only the names of artists. I have to fix it now, otherwise my hosting company will shut me down. I've siphoned out all other options, it is the XML that is slowing the script.

I now have to figure out whether last.fm has a slow API (or is limiting calls without them telling us), or whether PHP's simplexml is actually rather slow.

One thing I realised is that the XML request fetches a lot more than I need, but I can't limit it through the API (ie give me info on only 3 bands, not 70). But "big" XML files only get to about 20kb. Could it be that, that is slowing down the script? Having to load 20kb into an object for each of the 2300 users?

Doesn't make sense that it can be that... I just need confirmation that it is probably last.fm's slow API. Or is it?

Any other help you can provide?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

旧城空念 2024-08-13 18:31:57

我不认为简单的 xml 那么慢,它很慢,因为它是一个解析器,但我认为 2300curl/file_get_contents 花费了更多时间。另外为什么不获取数据而只使用 simplexml_load_string ,您真的需要将这些文件放在服务器的磁盘上吗?

至少从内存加载应该会加快一些速度,您还要对加载的 xml 进行什么样的处理?您确定您的处理效率很高吗?

I don't think simple xml is that slow, it's slow because it is a parser but I think the 2300 curl/file_get_contents are taking a lot more time. Also why don't fetch the data and just use simplexml_load_string, do you really need to put those file on the disk of the server ?

At least loading from memory should speed up a bit things, also what kind of processing are you going on the loaded xmls ? are you sure you processing is efficient as it could be ?

违心° 2024-08-13 18:31:57

20kb * 2300 个用户约为 45MB。如果您的下载速度约为 25kB/秒,则仅下载数据就需要 30 分钟,更不用说解析它了。

20kb * 2300 users is ~45MB. If you're downloading at ~25kB/sec, it will take 30 minutes just to download the data, let alone parse it.

梦里南柯 2024-08-13 18:31:57

确保您从 last.fm 下载的 XML 已进行 gzip 压缩。您可能必须包含正确的 HTTP 标头来告诉服务器您支持 gzip。它会加快下载速度,但解压缩部分会占用更多服务器资源。

还可以考虑使用异步下载来释放服务器资源。它不一定会加快进程的速度,但应该会让服务器管理员感到高兴。

如果 XML 本身很大,请使用 SAX 解析器,而不是 DOM 解析器。

Make sure the XML that you download from last.fm is gzipped. You'd probably have to include the correct HTTP header to tell the server you support gzip. It would speed up the download but eat more server resources with the ungzipping part.

Also consider using asynchronous downloads to free server resources. It won't necessarily speed the process up, but it should make the server administrators happy.

If the XML itself is big, use a SAX parser, instead of a DOM parser.

怪异←思 2024-08-13 18:31:57

我认为每秒 1 次 API 调用是有限制的。我不确定这个政策是通过代码强制执行的,但它可能与此有关。您可以通过 irc.last.fm 询问 IRC 上的 Last.fm 工作人员#audioscrobbler相信情况确实如此。

I think there's a limit of 1 API call per second. I'm not sure this policy is being enforced through code, but it might have something to do with it. You can ask the Last.fm staff on IRC at irc.last.fm #audioscrobbler if you believe this to be the case.

世态炎凉 2024-08-13 18:31:57

按照建议,使用 simplexml_load_string 获取数据并进行解析,而不是依赖 simplexml_load_file - 它的速度大约是原来的两倍。这是一些代码:

function simplexml_load_file2($url, $timeout = 30) {


// parse domain etc from url
$url_parts = parse_url($url);
if(!$url_parts || !array_key_exists('host', $url_parts)) return false;

$fp = fsockopen($url_parts['host'], 80, $errno, $errstr, $timeout);
if($fp) 
{
    $path = array_key_exists('path', $url_parts) ? $url_parts['path'] : '/'; 
    if(array_key_exists('query', $url_parts)) 
    {
        $path .= '?' . $url_parts['query'];
    }

    // make request
    $out = "GET $path HTTP/1.1\r\n";
    $out .= "Host: " . $url_parts['host'] . "\r\n";
    $out .= "Connection: Close\r\n\r\n";

    fwrite($fp, $out);

    // get response
    $resp = "";
    while (!feof($fp))
    {
        $resp .= fgets($fp, 128);
    }
    fclose($fp);

    $parts = explode("\r\n\r\n", $resp);
    $headers = array_shift($parts);

    $status_regex = "/HTTP\/1\.\d\s(\d+)/";
    if(preg_match($status_regex, $headers, $matches) && $matches[1] == 200)
    {
        $xml = join("\r\n\r\n", $parts);    
        return @simplexml_load_string($xml);            
    }   

}
return false; }

As suggested, fetch the data and parse using simplexml_load_string rather than relying on simplexml_load_file - it works out about twice as fast. Here's some code:

function simplexml_load_file2($url, $timeout = 30) {


// parse domain etc from url
$url_parts = parse_url($url);
if(!$url_parts || !array_key_exists('host', $url_parts)) return false;

$fp = fsockopen($url_parts['host'], 80, $errno, $errstr, $timeout);
if($fp) 
{
    $path = array_key_exists('path', $url_parts) ? $url_parts['path'] : '/'; 
    if(array_key_exists('query', $url_parts)) 
    {
        $path .= '?' . $url_parts['query'];
    }

    // make request
    $out = "GET $path HTTP/1.1\r\n";
    $out .= "Host: " . $url_parts['host'] . "\r\n";
    $out .= "Connection: Close\r\n\r\n";

    fwrite($fp, $out);

    // get response
    $resp = "";
    while (!feof($fp))
    {
        $resp .= fgets($fp, 128);
    }
    fclose($fp);

    $parts = explode("\r\n\r\n", $resp);
    $headers = array_shift($parts);

    $status_regex = "/HTTP\/1\.\d\s(\d+)/";
    if(preg_match($status_regex, $headers, $matches) && $matches[1] == 200)
    {
        $xml = join("\r\n\r\n", $parts);    
        return @simplexml_load_string($xml);            
    }   

}
return false; }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文