Python。从页面获取URL

发布于 2025-01-18 03:17:01 字数 352 浏览 0 评论 0原文

该代码分析页面并提取URL以生成站点地图。与URL一起，我还取出了JS代码的一部分。如何闪烁以排除JS？

if (resp.status == 200 and
        ('text/html' in resp.headers.get('content-type'))):
    data = (await resp.read()).decode('utf-8', 'replace')
    urls = re.findall(r'(?i)href=["\']?([^\s"\'<>]+)', data)
    asyncio.Task(self.addurls([(u, url) for u in urls])

原文

This code parses the page and extracts the url to generate the sitemap. Along with the url, I also take away a part of the js code. How to flicker to exclude js ?

if (resp.status == 200 and
        ('text/html' in resp.headers.get('content-type'))):
    data = (await resp.read()).decode('utf-8', 'replace')
    urls = re.findall(r'(?i)href=["\']?([^\s"\'<>]+)', data)
    asyncio.Task(self.addurls([(u, url) for u in urls])

分享到QQ

分享到微博