如何使用 PHP 从 HTML 中提取所有表单信息
我需要一种通过 PHP 脚本提取网页上所有表单信息的方法。 所以我有:
$url = "http://somewebpage.com/";
我需要的信息是:
网页上所有表单及其选项/属性的列表,例如:
示例输出如下:
Form1:表单名称:“login”,操作:“login.php”,方法:“GET”
- 输入类型:“text”,名称:“usrname”
- 输入类型:“password”,名称:“pass”
Form2:表单名称:“login2”,操作:“login2.php”,方法:“POST”
- 输入类型:“text”,名称:“usr”
- 输入类型:“password”,名称:“pwd”
我用的是下面的方法将网页的 HTML 内容放入变量中:
// cURL
$browser_id = "some crazy browser";
$curl_handle = curl_init();
$options = array
(
CURLOPT_URL=>$url,
CURLOPT_HEADER=>true,
CURLOPT_RETURNTRANSFER=>true,
CURLOPT_FOLLOWLOCATION=>true,
CURLOPT_USERAGENT=>$browser_id
);
curl_setopt_array($curl_handle,$options);
$server_output = curl_exec($curl_handle);
curl_close($curl_handle);
然后我用它来删除标题信息,只保留 HTML 内容,否则 DOM 总是会给我错误。
$server_output2 = substr($server_output, stripos($server_output, "<html"));
为了查找表单,我使用 DOM
$dom = new DomDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($server_output2);
$params = $dom->getElementsByTagName('form'); // Find Sections
$k=0;
foreach ($params as $param){
$forms[$k][0] = $params->item($k)->getAttribute('name');
$forms[$k][1] = $params->item($k)->getAttribute('action');
$forms[$k][2] = $params->item($k)->getAttribute('method');
$k++;
}
但是我的问题是,我经常从 DOM 收到有关未封闭标签或其他信息的错误。而且我不想得到这个信息。我怎样才能让它发挥作用? 另外,我当前的代码仅输出表单信息,而不输出表单中的输入,我也想知道这一点。我怎样才能做到这一点?感谢您的帮助。 您可以在 http://sourceforge 上查看我的项目远程攻击向量(这就是我需要的)。净/项目/rav/文件/ 或者查看我的网站:http://tamasiweb.hu
I need a way to extract all form information on a webpage, via a PHP script.
so I have:
$url = "http://somewebpage.com/";
the info I need is:
A list of all the forms on the webpage, and their options/atributes like:
A sample output would be as follows:
Form1: Form name: "login", action: "login.php", method: "GET"
- Input type: "text", name: "usrname"
- Input type: "password", name: "pass"
Form2: Form name: "login2", action: "login2.php", method: "POST"
- Input type: "text", name: "usr"
- Input type: "password", name: "pwd"
I use the following method to put the HTML contents of the webpage, into a variable:
// cURL
$browser_id = "some crazy browser";
$curl_handle = curl_init();
$options = array
(
CURLOPT_URL=>$url,
CURLOPT_HEADER=>true,
CURLOPT_RETURNTRANSFER=>true,
CURLOPT_FOLLOWLOCATION=>true,
CURLOPT_USERAGENT=>$browser_id
);
curl_setopt_array($curl_handle,$options);
$server_output = curl_exec($curl_handle);
curl_close($curl_handle);
Then I use this to remove the header info, and just keep the HTML stuff, cause otherwise DOM always gives me errors.
$server_output2 = substr($server_output, stripos($server_output, "<html"));
The for finding the forms, I use DOM
$dom = new DomDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($server_output2);
$params = $dom->getElementsByTagName('form'); // Find Sections
$k=0;
foreach ($params as $param){
$forms[$k][0] = $params->item($k)->getAttribute('name');
$forms[$k][1] = $params->item($k)->getAttribute('action');
$forms[$k][2] = $params->item($k)->getAttribute('method');
$k++;
}
However my problem is, I often get errors from DOM, about unclosed tags, or other info. And I don't want to get this info. How can I make it work?
Also my current code, only outputs the form info, not the inputs in a form, which I also want to know. How can I make this work? Thank you for your help.
You can view my project Remote Attack Vector (this is what I need it for) at http://sourceforge.net/projects/rav/files/
Or check out my website: http://tamasiweb.hu
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,下载这个 php lib
http://sourceforge.net/projects/snoopy/
类用法:
希望这有帮助
well, download this php lib
http://sourceforge.net/projects/snoopy/
class usage :
hope that helps