抓取电子邮件地址
fff.html 是一封电子邮件,其中包含电子邮件地址,有些有 href mailto 链接,有些没有,我想抓取它们并将它们输出为以下格式
[email protected],[email protected],[email protected]
我有一个简单的抓取工具来获取 href 链接的链接,但有些东西是奇怪的是,
<?php
$url = "fff.html";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));
$start = strpos($content,'<a href="mailto:');
$end = strpos($content,'"',$start) + 8;
$mail = substr($content,$start,$end-$start);
print "$mail<br />";
?>
我应该为 lorem ipsum 的原始使用获得额外的积分
fff.html is an email with email addresses in it some have href mailto links and some don't, i want to scrape them and output them into the following format
[email protected],[email protected],[email protected]
I have a simple scraper to get the ones that are href linked but something is wierd
<?php
$url = "fff.html";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));
$start = strpos($content,'<a href="mailto:');
$end = strpos($content,'"',$start) + 8;
$mail = substr($content,$start,$end-$start);
print "$mail<br />";
?>
I should get extra points for the original use of lorem ipsum
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题是如果 HTML 页面中有多个电子邮件地址怎么办? substr 只会返回第一个实例。这是一个将解析所有电子邮件地址的脚本。您可能需要对其进行一些调整以供您使用。它将以您请求的 CSV 形式输出结果。
The problem is what if you have more than one email address in the HTML page. substr will only return the first instance. Here is a script that will parse all email addresses. You may need to tweak it some for your use. It will output the results in the CSV form you requested.