上一个/下一个网页链接启发式?
我正在寻找一个启发式列表,给定一个 HTML 文档和/或网页上的一组 URL,这将给出一组作为该页面的上一个/下一个链接的 URL。另外,假设您已获得基本 URL。我不需要知道链接是否具体是下一个或上一个 URL,只需知道它是这两个 URL 之一即可。
我已经列出了一个简短的列表:
- 与 URL 相同的域和路径,但查询参数不同。
- 基地:abc.com/story
- 下一个/上一个:abc.com/story?p=2
- 或
- 基础:abc.com/story.html?p=5
- 下一个/上一个:abc.com/story.html?p=3
- URL 与基本 URL 相同,除了数字路径元素之外。
- 基地:abc.com/story
- 下一个/上一个:abc.com/story/2
- DOM/HTML 中彼此相邻的多个链接。
- 我知道这也可能像页眉/页脚,我必须以某种方式解释这一点......有什么想法吗?
- 文本为数字或测试为“下一个”、“上一个”、“第一个”、“最后一个”、“后退”、“前进”等单词的链接...
我知道我永远不可能在这方面做到完美,但我希望获得尽可能多的报道和启发,以期获得良好的组合或数量和质量。谢谢。
I'm looking for a list of heuristics, given an HTML document and/or a set of URLs on a web page, that will give a set of URLs that are previous/next links from that page. Also, assume that you are given the base URL. I do not require to know if a link is specifically a next or previous URL, just that it is one of those two.
I've got a short list going already:
- Same domain and path as the URL, but different query parameters.
- base: abc.com/story
- next/previous: abc.com/story?p=2
- or
- base: abc.com/story.html?p=5
- next/previous: abc.com/story.html?p=3
- URL is the same as the base URL except a numerical path element.
- base: abc.com/story
- next/previous: abc.com/story/2
- Several links nearby each other in the DOM/HTML.
- I know this could also be like a header/footer, I would have to account for that somehow...any ideas?
- Links whose text is a number or whose test is a word like "Next", "Previous", "First", "Last", "Back", "Forward", etc...
I know I can never be perfect at this, but I would like to get as much coverage and as many heuristics as I can to hope for a nice mix or quantity and quality. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论