如何使用 PHP 从 HTML 中提取所有表单信息

发布于 2024-12-21 19:32:49 字数 1788 浏览 0 评论 0原文

我需要一种通过 PHP 脚本提取网页上所有表单信息的方法。 所以我有:

$url = "http://somewebpage.com/";

我需要的信息是:
网页上所有表单及其选项/属性的列表,例如:
示例输出如下:

Form1:表单名称:“login”,操作:“login.php”,方法:“GET”

  1. 输入类型:“text”,名称:“usrname”
  2. 输入类型:“password”,名称:“pass”

Form2:表单名称:“login2”,操作:“login2.php”,方法:“POST”

  1. 输入类型:“text”,名称:“usr”
  2. 输入类型:“password”,名称:“pwd”

我用的是下面的方法将网页的 HTML 内容放入变量中:


// cURL
$browser_id = "some crazy browser";
$curl_handle = curl_init();
$options = array
(
CURLOPT_URL=>$url,
CURLOPT_HEADER=>true,
CURLOPT_RETURNTRANSFER=>true,
CURLOPT_FOLLOWLOCATION=>true,
CURLOPT_USERAGENT=>$browser_id
);
curl_setopt_array($curl_handle,$options);
$server_output = curl_exec($curl_handle);
curl_close($curl_handle);

然后我用它来删除标题信息,只保留 HTML 内容,否则 DOM 总是会给我错误。

$server_output2 = substr($server_output, stripos($server_output, "<html"));

为了查找表单,我使用 DOM

$dom = new DomDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($server_output2);
$params = $dom->getElementsByTagName('form'); // Find Sections
$k=0;
foreach ($params as $param){
$forms[$k][0] = $params->item($k)->getAttribute('name');
$forms[$k][1] = $params->item($k)->getAttribute('action');
$forms[$k][2] = $params->item($k)->getAttribute('method');
$k++;
}

但是我的问题是,我经常从 DOM 收到有关未封闭标签或其他信息的错误。而且我不想得到这个信息。我怎样才能让它发挥作用? 另外,我当前的代码仅输出表单信息,而不输出表单中的输入,我也想知道这一点。我怎样才能做到这一点?感谢您的帮助。 您可以在 http://sourceforge 上查看我的项目远程攻击向量(这就是我需要的)。净/项目/rav/文件/ 或者查看我的网站:http://tamasiweb.hu

I need a way to extract all form information on a webpage, via a PHP script.
so I have:

$url = "http://somewebpage.com/";

the info I need is:
A list of all the forms on the webpage, and their options/atributes like:
A sample output would be as follows:

Form1: Form name: "login", action: "login.php", method: "GET"

  1. Input type: "text", name: "usrname"
  2. Input type: "password", name: "pass"

Form2: Form name: "login2", action: "login2.php", method: "POST"

  1. Input type: "text", name: "usr"
  2. Input type: "password", name: "pwd"

I use the following method to put the HTML contents of the webpage, into a variable:


// cURL
$browser_id = "some crazy browser";
$curl_handle = curl_init();
$options = array
(
CURLOPT_URL=>$url,
CURLOPT_HEADER=>true,
CURLOPT_RETURNTRANSFER=>true,
CURLOPT_FOLLOWLOCATION=>true,
CURLOPT_USERAGENT=>$browser_id
);
curl_setopt_array($curl_handle,$options);
$server_output = curl_exec($curl_handle);
curl_close($curl_handle);

Then I use this to remove the header info, and just keep the HTML stuff, cause otherwise DOM always gives me errors.

$server_output2 = substr($server_output, stripos($server_output, "<html"));

The for finding the forms, I use DOM

$dom = new DomDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($server_output2);
$params = $dom->getElementsByTagName('form'); // Find Sections
$k=0;
foreach ($params as $param){
$forms[$k][0] = $params->item($k)->getAttribute('name');
$forms[$k][1] = $params->item($k)->getAttribute('action');
$forms[$k][2] = $params->item($k)->getAttribute('method');
$k++;
}

However my problem is, I often get errors from DOM, about unclosed tags, or other info. And I don't want to get this info. How can I make it work?
Also my current code, only outputs the form info, not the inputs in a form, which I also want to know. How can I make this work? Thank you for your help.
You can view my project Remote Attack Vector (this is what I need it for) at http://sourceforge.net/projects/rav/files/
Or check out my website: http://tamasiweb.hu

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

吹梦到西洲 2024-12-28 19:32:49

好吧,下载这个 php lib

http://sourceforge.net/projects/snoopy/

类用法:

    $uri = "http://anysite.com/form";

    $snoopy = new Snoopy;

    if($snoopy->fetchform($uri)){
        $result = $snoopy->results;
    }
    echo $result; 

希望这有帮助

well, download this php lib

http://sourceforge.net/projects/snoopy/

class usage :

    $uri = "http://anysite.com/form";

    $snoopy = new Snoopy;

    if($snoopy->fetchform($uri)){
        $result = $snoopy->results;
    }
    echo $result; 

hope that helps

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文