We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
你需要一个爬虫。自己构建一个非常简单(对于小型爬网!)
对于小型网站(几千页),您可能可以在内存中完成所有这些操作。
** 使用常用的 java URLConnection或 commons HTTPClient (v4) 用于发出请求。
注意:查找链接 - 链接可以以绝对链接、本地链接或解析为某些基本 href 的形式存在于网站上。您在寻找自己的产品时需要考虑到这一点。最简单的方法是将所有链接转换为绝对形式,注意解析当前页面的基本 href(如果有)。
简单。
You need a crawler. It's pretty trivial to build one yourself (for small crawls!)
For a small site (few thousand pages) you can probably do all this in memory.
** Use the usual java URLConnection or commons HTTPClient (v4) for making the requests.
Note: finding your link - links can exist on a site in absolute, local or resolved to some base href. You'll need to account for this when looking for yours. Easiest is to translate all links to absolute form, taking care to resolve to the current pages base href, if it has one.
Simples.
您可以使用搜索页面:
示例:
http://search.botw.org/search?q=stackoverflow.com
您无需爬行整个网站,只需验证是否能获得良好的结果。
You can use the search page:
Example:
http://search.botw.org/search?q=stackoverflow.com
Instead of crawling the entire site, you can just verify if you get a good result there.
您将必须爬行该网站,解析每个页面的锚点,并测试它们是否与您正在寻找的内容相匹配...假设他们不使用 javascript 来创建链接,很少有人这样做。
标准的 XML 解析器就可以工作,SAX 可能是最容易学习的。
You will have to crawl the site, parse each page for anchors, and test if they match what you are looking for... assuming they are not using javascript to create links, few people do.
A standard XML parser will work, SAX is probably the easiest to learn.