用机器学习来理解 Python 中的网站结构
我一直致力于抓取网页并提取网站的元素。 前任: 给定一个网站 - 爬虫应该返回以下部分:页眉、菜单、页脚、内容等。
我在想,如果我可以使用机器学习来训练代码以学习如何对网站进行分类,那就太好了。
我尝试查看 Python 机器学习库(例如:PyBrain),但示例非常复杂。 谁能给我推荐一个库和一些教程,通过一些简单的示例来开始使用 Python 机器学习?
谢谢!
I have been working on crawling webpages and extracting the elements of the website.
Ex:
Given a website - The crawler should return the following sections: Header, Menu, Footer, content etc.
I was thinking that it would be great if I could use machine learning to train the code to learn how to classify websites.
I tried looking at Python Machine learning libraries (ex: PyBrain) but the examples are very complex.
Can anyone please suggest me a library and some tutorial on how to get started on using Python Machine Learning with some simple examples?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
MLPy 对您来说可能是一个更简单的开始。
以下是分类文档的链接。顺便说一句,如果您不知道类应该是什么样子,也许您需要 对您的页面进行聚类,而不是对它们进行分类。
MLPy may be a simpler start for you.
Here is a link to the documentation on classification. By the way, if you don't know what the classes should look like, maybe you need to cluster your pages, and not to classify them.