解析网站
我想制作一个程序,将用户输入的网站地址作为输入。然后程序会访问该网站,下载它,然后解析其中的信息。它使用网站中的信息输出一个新的 html 文件。
具体来说,该程序将执行的操作是从网站获取某些链接,并将这些链接放入输出 html 文件中,然后它将丢弃其他所有内容。
现在我只想让它适用于不需要登录的网站,但稍后我想让它适用于需要登录的网站,因此它必须能够处理 cookie。
我还希望稍后让程序能够探索某些链接并从其他站点下载信息。
执行此操作的最佳编程语言或工具是什么?
I want to make a program that takes as user input a website address. The program then goes to that website, downloads it, and then parses the information inside. It outputs a new html file using the information from the website.
Specifically, what this program will do is take certain links from the website, and put the links in the output html file, and it will discard everything else.
Right now I just want to make it for websites that don't require a login, but later on I want to make it work for sites where you have to login, so it will have to be able to deal with cookies.
I'll also want to later on have the program be able to explore certain links and download information from those other sites.
What are the best programming languages or tools to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
强烈推荐 Beautiful Soup (Python),尽管我个人没有使用它的经验。
Beautiful Soup (Python) comes highly recommended, though I have no experience with it personally.
Python。
使用 python 的标准库编写一个简单的爬虫程序相当容易,但您还可以在网络上找到一些现有的 python 爬虫程序库。
Python.
It's fairly easy to write a simple crawler using python's standard libs, but you'll also be able to find some existing python crawler libraries available on the web.