解析爬虫的url

发布于 2024-09-18 07:22:30 字数 217 浏览 11 评论 0原文

我正在编写一个小型爬虫，在获取链接的同时提取大约 5 到 10 个站点，

../tets/index.html

如果它是 /test/index.html 我们可以添加基本 url http ://www.example.com/test/index.html

我能为这种网址做什么。

原文

i am writting an small crawler that extract some 5 to 10 sites while getting the links i am getting some urls like this

../tets/index.html

if it is /test/index.html we can add with base url http://www.example.com/test/index.html

what can i do for this kind of urls.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜清冷一曲。 2024-09-25 07:22:30

像这样的网址是相对网址。 “..”表示“父目录”，而“.”表示“父目录”。简单来说就是“这个目录”，就像在 bash 中一样。
例如，如果您正在查看此页面： http://www.someserver/test/foo /bar.html ，里面有一个像这样的url：“../baz/foobar.html”，它实际上会指向 http://www.someserver/test/baz/foobar.html 我想。只是测试一下。

回复收藏 0 原文

一绘本一梦想 2024-09-25 07:22:30

使用 dirname() 获取基本目录，删除 ..< /code> 使用 substr() 并将其附加到那里。像这样：

<?php
$url = "../tets/index.html";
$currentURL = "http://example.com/somedir/anotherdir";
echo dirname($currentURL).substr($url, 2);
?>

输出：

http://example.com/somedir/tets/index.html

Use dirname() to get base directoy, remove the .. using substr() and append it there. Like this:

<?php
$url = "../tets/index.html";
$currentURL = "http://example.com/somedir/anotherdir";
echo dirname($currentURL).substr($url, 2);
?>

This outputs:

http://example.com/somedir/tets/index.html

回复收藏 0 原文

初与友歌 2024-09-25 07:22:30

查看此 URL 规范化维基百科页面。

回复收藏 0 原文

~没有更多了~

关于作者

层林尽染

暂无简介

0 文章

0 评论

24 人气

关注发私信

Gabu-gabumon

文章 0 评论 0

关注

qq_CgiN62

文章 0 评论 0

关注

荔枝明

文章 0 评论 0

关注

赏烟花じ飞满天

文章 0 评论 0

关注

独守阴晴ぅ圆缺

文章 0 评论 0

关注

¤→小豸慧

文章 0 评论 0

友情链接

文江博客

解析爬虫的url

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

解析爬虫的url

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。