当前位置：文江博客话题详情

content-management

获取网站数据（内容）的最佳方式？

发布于 2024-08-08 20:37:40 字数 106 浏览 9 评论 0原文

我需要抓取一些网站数据（内容）这些网站提供列表，我需要抓取这些列表并根据内容过滤它们，

有什么软件可以做到这一点？ PHP 脚本？如果没有，我可以从哪里开始对此功能进行编程？

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（3）

最后的乘客 2024-08-15 20:37:40

使用 file_get_contents() 返回整个文件的字符串，然后解析该字符串以提取内容。

其他选项是 cURL 或 wget，它们将获取整个文件，然后使用 AWK 和 SED 或 PERL 等处理它们，

具体取决于您需要抓取目标页面的频率。如果偶尔使用 PHP，但您需要从浏览器触发它，并且请记住 PHP 中的正则表达式可能非常耗时。

如果您想定期抓取文件，则可以在后台运行带有 cURL/wget + sed 和 awk 的 BASH 脚本，无需干预。

回复收藏 0 原文

自我难过 2024-08-15 20:37:40

如果它的 php .. 可能会帮助你.. http:// www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial

// get the HTML
$html = file_get_contents("http://www.thefutureoftheweb.com/blog/");


preg_match_all(
    '/<li>.*?<h1><a href="(.*?)">(.*?)<\/a><\/h1>.*?<span class="date">(.*?)<\/span>.*?<div class="section">(.*?)<\/div>.*?<\/li>/s',
    $html,
    $posts, // will contain the blog posts
    PREG_SET_ORDER // formats data into an array of posts
);

foreach ($posts as $post) {
    $link = $post[1];
    $title = $post[2];
    $date = $post[3];
    $content = $post[4];

    // do something with data
}

当然，您需要根据您的要求自定义正则表达式。

您还可以找到大量其他示例.. http://www.google.com/search?source=ig&hl=en&rlz=&=&q=php+web +scraper&aq=f&oq=&aqi=

If its php .. may be this helps you .. http://www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial

// get the HTML
$html = file_get_contents("http://www.thefutureoftheweb.com/blog/");


preg_match_all(
    '/<li>.*?<h1><a href="(.*?)">(.*?)<\/a><\/h1>.*?<span class="date">(.*?)<\/span>.*?<div class="section">(.*?)<\/div>.*?<\/li>/s',
    $html,
    $posts, // will contain the blog posts
    PREG_SET_ORDER // formats data into an array of posts
);

foreach ($posts as $post) {
    $link = $post[1];
    $title = $post[2];
    $date = $post[3];
    $content = $post[4];

    // do something with data
}

Of course, you'll need to customise the regular expression depending upon your requirements.

Also loads of other examples you could find .. http://www.google.com/search?source=ig&hl=en&rlz=&=&q=php+web+scraper&aq=f&oq=&aqi=

回复收藏 0 原文

难得心□动 2024-08-15 20:37:40

没有什么神奇的事情。因为每个页面的内容都不一样。
当您谈论 PHP 时，我将为您提供有关该语言的一些线索。

您可以使用 curl 获取网页。
获取内容后，可以使用正则表达式进行解析。

根据您想要做什么，您必须自己开发应用程序。

回复收藏 0 原文

~没有更多了~

关于作者

阳光下的泡沫是彩色的

暂无简介

0 文章

0 评论

24 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

亚希

文章 0 评论 0

cyp

文章 0 评论 0

北漠

文章 0 评论 0

11223456

文章 0 评论 0

坠似风落

文章 0 评论 0

游魂

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文