PHP:如何从不同服务器加载文件作为字符串?

发布于 2024-07-23 07:49:55 字数 161 浏览 8 评论 0原文

我正在尝试从不同的域名加载 XML 文件作为字符串。 我想要的只是 << 中的文本数组。 标题>< /标题> xml 文件的标签,所以我想既然我使用 php4,最简单的方法就是对其进行正则表达式来获取它们。 有人可以解释一下如何将 XML 作为字符串加载吗? 谢谢!

I am trying to load an XML file from a different domain name as a string. All I want is an array of the text within the < title >< /title > tags of the xml file, so I am thinking since I am using php4 the easiest way would be to do a regex on it to get them. Can someone explain how to load the XML as a string? Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

羅雙樹 2024-07-30 07:49:55

您可以像下面的示例一样使用 cURL。 我应该补充一点,基于正则表达式的 XML 解析通常不是一个好主意,使用真正的解析器可能会更好,尤其是当它变得更加复杂时。

您可能还想添加一些正则表达式修饰符以使其跨多行工作等,但我认为问题更多是关于将内容提取到字符串中。

<?php

$curl = curl_init('http://www.example.com');

//make content be returned by curl_exec rather than being printed immediately                                 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($curl);

if ($result !== false) {
    if (preg_match('|<title>(.*)</title>|i', $result, $matches)) {
        echo "Title is '{$matches[1]}'";   
    } else {
        //did not find the title    
    }
} else {
    //request failed
    die (curl_error($curl)); 
}

You could use cURL like the example below. I should add that regex-based XML parsing is generally not a good idea, and you may be better off using a real parser, especially if it gets any more complicated.

You may also want to add some regex modifiers to make it work across multiple lines etc., but I assume the question is more about fetching the content into a string.

<?php

$curl = curl_init('http://www.example.com');

//make content be returned by curl_exec rather than being printed immediately                                 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($curl);

if ($result !== false) {
    if (preg_match('|<title>(.*)</title>|i', $result, $matches)) {
        echo "Title is '{$matches[1]}'";   
    } else {
        //did not find the title    
    }
} else {
    //request failed
    die (curl_error($curl)); 
}
樱花细雨 2024-07-30 07:49:55

第一次使用
file_get_contents('http://www.example.com/');

获取文件,
插入到变量。
解析xml后
链接是
http://php.net/manual/en/function.xml-parse。 php
评论里有例子

first use
file_get_contents('http://www.example.com/');

to get the file,
insert in to var.
after parse the xml
the link is
http://php.net/manual/en/function.xml-parse.php
have example in the comments

孤蝉 2024-07-30 07:49:55

如果您要加载格式正确的 xml,请跳过基于字符的解析,并使用 DOM 函数:

$d = new DOMDocument;
$d->load("http://url/file.xml");
$titles = $d->getElementsByTagName('title');
if ($titles) {
    echo $titles->item(0)->nodeValue;
}

如果由于 php 的设置方式而无法使用 DOMDocument::load(),则可以使用curl 来获取文件然后执行以下操作:

$d = new DOMDocument;
$d->loadXML($grabbedfile);
...

If you're loading well-formed xml, skip the character-based parsing, and use the DOM functions:

$d = new DOMDocument;
$d->load("http://url/file.xml");
$titles = $d->getElementsByTagName('title');
if ($titles) {
    echo $titles->item(0)->nodeValue;
}

If you can't use DOMDocument::load() due to how php is set up, the use curl to grab the file and then do:

$d = new DOMDocument;
$d->loadXML($grabbedfile);
...
半窗疏影 2024-07-30 07:49:55

我将此函数作为一个片段:

function getHTML($url) {
    if($url == false || empty($url)) return false;
    $options = array(
        CURLOPT_URL            => $url,     // URL of the page
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_USERAGENT      => "spider", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 3,       // stop after 3 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    //Ending all that cURL mess...


    //Removing linebreaks,multiple whitespace and tabs for easier Regexing
    $content = str_replace(array("\n", "\r", "\t", "\o", "\xOB"), '', $content);
    $content = preg_replace('/\s\s+/', ' ', $content);
    $this->profilehtml = $content;
    return $content;
}

它返回没有换行符、制表符、多个空格等的 HTML,只有 1 行。

所以现在你执行 preg_match:

$html = getHTML($url)
preg_match('|<title>(.*)</title>|iUsm',$html,$matches);

并且 $matches[1] 将拥有你需要的信息。

I have this function as a snippet:

function getHTML($url) {
    if($url == false || empty($url)) return false;
    $options = array(
        CURLOPT_URL            => $url,     // URL of the page
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_USERAGENT      => "spider", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 3,       // stop after 3 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    //Ending all that cURL mess...


    //Removing linebreaks,multiple whitespace and tabs for easier Regexing
    $content = str_replace(array("\n", "\r", "\t", "\o", "\xOB"), '', $content);
    $content = preg_replace('/\s\s+/', ' ', $content);
    $this->profilehtml = $content;
    return $content;
}

That returns the HTML with no linebreaks, tabs, multiple spaces, etc, only 1 line.

So now you do this preg_match:

$html = getHTML($url)
preg_match('|<title>(.*)</title>|iUsm',$html,$matches);

and $matches[1] will have the info you need.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文