网页抓取:如何从文本链接获取抓取工具实现?
我正在构建一个 java 网络媒体抓取应用程序,用于从各种流行网站中提取内容:youtube、facebook、rapidshare 等。
该应用程序将包含查找内容 url 的搜索功能,但如果用户已经位于媒体所在的位置,还应允许用户将 url 粘贴到应用程序中。 Youtube Downloader 已经为各种视频网站实现了此功能。
当程序提供了 URL 时,它会决定使用哪种抓取工具来获取内容;例如,youtube 观看链接返回 YoutubeScraper,Facebook 粉丝专页链接返回 FacebookScraper 等等。
我应该使用工厂模式来做到这一点吗?
我的想法是工厂有一个公共方法。它采用表示链接的 String 参数,并返回 Scraper 接口的合适实现。我猜工厂会保存一份 Scraper 实现列表,并将链接与每个 Scraper 进行匹配,直到找到合适的。如果没有合适的,则会抛出异常。
I'm building a java web media-scraping application for extracting content from a variety of popular websites: youtube, facebook, rapidshare, and so on.
The application will include a search capability to find content urls, but should also allow the user to paste a url into the application if they already where the media is. Youtube Downloader already does this for a variety of video sites.
When the program is supplied with a URL, it decides which kind of scraper to use to get the content; for example, a youtube watch link returns a YoutubeScraper, a Facebook fanpage link returns a FacebookScraper and so on.
Should I use the factory pattern to do this?
My idea is that the factory has one public method. It takes a String argument representing a link, and returns a suitable implementation of the Scraper interface. I guess the Factory would hold a list of Scraper implementations, and would match the link against each Scraper until it finds a suitable one. If there is no suitable one, it throws an Exception instead.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来是个好主意。您很可能需要一个带有 create(URL url) 方法的单例。我建议您使用 TDD 来执行此操作,以便更清楚地了解您的需求。
Sounds like a good idea. You most likely want a singleton with a create(URL url) method. I would recommend you use TDD to do this to get your requirements clearer in your mind.
工厂退货就好了。为了概括这一尝试,我建议使用映射来保存实现,即:
稍后您可以使用映射的键检查 url 并为该内容实例化正确的类。
A factory returning the stuff will be fine. To generalize the attempt, I recommend to use a map for holding implementations, i.e.:
Later you can check the url with the keys of the map and instantiate the right class for that content.