如何在Python中给定两个绝对url来构造相对url

发布于 2024-12-05 04:57:01 字数 241 浏览 1 评论 0原文

是否有内置函数可以获取如下所示的网址: ../images.html 给定如下所示的基本网址: http://www.example.com/faq/index.html< /code> 和目标 url,例如 http://www.example.com/images.html

我检查了 urlparse 模块。我想要的是 urljoin() 函数的对应部分。

Is there a builtin function to get url like this: ../images.html given a base url like this: http://www.example.com/faq/index.html and a target url such as http://www.example.com/images.html

I checked urlparse module. What I want is counterpart of the urljoin() function.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

野却迷人 2024-12-12 04:57:01

您可以使用 urlparse.urlparse 查找路径和 posixpath 版本os.path.relname 查找相对路径。

(警告:这适用于 Linux,但可能不适用于 Windows):

import urlparse
import sys
import posixpath

def relurl(target,base):
    base=urlparse.urlparse(base)
    target=urlparse.urlparse(target)
    if base.netloc != target.netloc:
        raise ValueError('target and base netlocs do not match')
    base_dir='.'+posixpath.dirname(base.path)
    target='.'+target.path
    return posixpath.relpath(target,start=base_dir)

tests=[
    ('http://www.example.com/images.html','http://www.example.com/faq/index.html','../images.html'),
    ('http://google.com','http://google.com','.'),
    ('http://google.com','http://google.com/','.'),
    ('http://google.com/','http://google.com','.'),
    ('http://google.com/','http://google.com/','.'), 
    ('http://google.com/index.html','http://google.com/','index.html'),
    ('http://google.com/index.html','http://google.com/index.html','index.html'), 
    ]

for target,base,answer in tests:
    try:
        result=relurl(target,base)
    except ValueError as err:
        print('{t!r},{b!r} --> {e}'.format(t=target,b=base,e=err))
    else:
        if result==answer:
            print('{t!r},{b!r} --> PASS'.format(t=target,b=base))
        else:
            print('{t!r},{b!r} --> {r!r} != {a!r}'.format(
                t=target,b=base,r=result,a=answer))

You could use urlparse.urlparse to find the paths, and the posixpath version of os.path.relname to find the relative path.

(Warning: This works for Linux, but may not for Windows):

import urlparse
import sys
import posixpath

def relurl(target,base):
    base=urlparse.urlparse(base)
    target=urlparse.urlparse(target)
    if base.netloc != target.netloc:
        raise ValueError('target and base netlocs do not match')
    base_dir='.'+posixpath.dirname(base.path)
    target='.'+target.path
    return posixpath.relpath(target,start=base_dir)

tests=[
    ('http://www.example.com/images.html','http://www.example.com/faq/index.html','../images.html'),
    ('http://google.com','http://google.com','.'),
    ('http://google.com','http://google.com/','.'),
    ('http://google.com/','http://google.com','.'),
    ('http://google.com/','http://google.com/','.'), 
    ('http://google.com/index.html','http://google.com/','index.html'),
    ('http://google.com/index.html','http://google.com/index.html','index.html'), 
    ]

for target,base,answer in tests:
    try:
        result=relurl(target,base)
    except ValueError as err:
        print('{t!r},{b!r} --> {e}'.format(t=target,b=base,e=err))
    else:
        if result==answer:
            print('{t!r},{b!r} --> PASS'.format(t=target,b=base))
        else:
            print('{t!r},{b!r} --> {r!r} != {a!r}'.format(
                t=target,b=base,r=result,a=answer))
你的心境我的脸 2024-12-12 04:57:01

第一个想到的解决方案是:

>>> os.path.relpath('/images.html', os.path.dirname('/faq/index.html'))
'../images.html'

当然,这需要URL解析->域名对比(!!)->路径重写(如果是这种情况)->重新添加查询和片段。

编辑:更完整的版本

import urlparse
import posixpath

def relative_url(destination, source):
    u_dest = urlparse.urlsplit(destination)
    u_src = urlparse.urlsplit(source)

    _uc1 = urlparse.urlunsplit(u_dest[:2]+tuple('' for i in range(3)))
    _uc2 = urlparse.urlunsplit(u_src[:2]+tuple('' for i in range(3)))

    if _uc1 != _uc2:
        ## This is a different domain
        return destination

    _relpath = posixpath.relpath(u_dest.path, posixpath.dirname(u_src.path))

    return urlparse.urlunsplit(('', '', _relpath, u_dest.query, u_dest.fragment)

然后

>>> relative_url('http://www.example.com/images.html', 'http://www.example.com/faq/index.html')
'../images.html'
>>> relative_url('http://www.example.com/images.html?my=query&string=here#fragment', 'http://www.example.com/faq/index.html')
'../images.html?my=query&string=here#fragment'
>>> relative_url('http://www.example.com/images.html', 'http://www2.example.com/faq/index.html')
'http://www.example.com/images.html'
>>> relative_url('https://www.example.com/images.html', 'http://www.example.com/faq/index.html')
'https://www.example.com/images.html'

编辑:现在使用 os.path 的 posixpath 实现使其也可以在 Windows 下工作。

The first solutions that comes to mind is:

>>> os.path.relpath('/images.html', os.path.dirname('/faq/index.html'))
'../images.html'

Of course, this requires URL parsing -> domain name comparison (!!) -> path rewriting if that's the case -> re-adding query and fragment.

Edit: a more complete version

import urlparse
import posixpath

def relative_url(destination, source):
    u_dest = urlparse.urlsplit(destination)
    u_src = urlparse.urlsplit(source)

    _uc1 = urlparse.urlunsplit(u_dest[:2]+tuple('' for i in range(3)))
    _uc2 = urlparse.urlunsplit(u_src[:2]+tuple('' for i in range(3)))

    if _uc1 != _uc2:
        ## This is a different domain
        return destination

    _relpath = posixpath.relpath(u_dest.path, posixpath.dirname(u_src.path))

    return urlparse.urlunsplit(('', '', _relpath, u_dest.query, u_dest.fragment)

Then

>>> relative_url('http://www.example.com/images.html', 'http://www.example.com/faq/index.html')
'../images.html'
>>> relative_url('http://www.example.com/images.html?my=query&string=here#fragment', 'http://www.example.com/faq/index.html')
'../images.html?my=query&string=here#fragment'
>>> relative_url('http://www.example.com/images.html', 'http://www2.example.com/faq/index.html')
'http://www.example.com/images.html'
>>> relative_url('https://www.example.com/images.html', 'http://www.example.com/faq/index.html')
'https://www.example.com/images.html'

Edit: now using the posixpath implementation of os.path to make it work under windows too.

魔法唧唧 2024-12-12 04:57:01
import itertools
import urlparse

def makeRelativeUrl(sourceUrl, targetUrl):
  '''

  :param sourceUrl: a string
  :param targetUrl: a string
  :return: the path to target url relative to first or targetUrl if at different net location
  '''
  # todo test
  parsedSource = urlparse.urlparse(sourceUrl)
  parsedTarget = urlparse.urlparse(targetUrl)

  if parsedSource.netloc == parsedTarget.netloc:
    # if target on same path but lower than source url
    if parsedTarget.path.startswith(parsedSource.path):
      return parsedTarget.path.replace(parsedSource.path, '.')
    # on same path
    elif parsedTarget.path.rsplit('/', 1)[0] == parsedSource.path.rsplit('/', 1)[0]:
      return './' + parsedTarget.path.rsplit('/', 1)[1]
    # same netloc, varying paths
    else:
      path = ''
      upCount = 0
      for item in list(itertools.izip_longest(parsedSource.path.rsplit('/'), parsedTarget.path.rsplit('/'))):
        if item[0] == item[1]:
          pass
        else:
          if item[0] is not None:
            upCount += 1
          if item[1] is not None:
            path += item[1] + '/'
      return upCount * '../' + path
  else:
    return targetUrl


if __name__ == '__main__':
  '''
  "tests" :p
  '''
  url1 = 'http://coolwebsite.com/questions/bobobo/bo/bo/1663807/how-can-i-iterate-through-two-lists-in-parallel-in-python'
  url2 = 'http://coolwebsite.com/questions/126524/iterate-a-list-with-indexes-in-python'

  print url1
  print url2
  print 'second relative to second:'
  print makeRelativeUrl(url1, url2)

  url1 = 'http://coolwebsite.com/questions/1663807/how-can-i-iterate-through-two-lists-in-parallel-in-python'
  url2 = 'http://coolwebsite.com/questions/1663807/bananas'

  print url1
  print url2
  print 'second relative to first:'
  print makeRelativeUrl(url1, url2)

  url1 = 'http://coolwebsite.com/questions/1663807/fruits'
  url2 = 'http://coolwebsite.com/questions/1663807/fruits/berries/bananas'

  print url1
  print url2
  print 'second relative to first:'
  print makeRelativeUrl(url1, url2)

运行“测试”看看它是否有效:P

import itertools
import urlparse

def makeRelativeUrl(sourceUrl, targetUrl):
  '''

  :param sourceUrl: a string
  :param targetUrl: a string
  :return: the path to target url relative to first or targetUrl if at different net location
  '''
  # todo test
  parsedSource = urlparse.urlparse(sourceUrl)
  parsedTarget = urlparse.urlparse(targetUrl)

  if parsedSource.netloc == parsedTarget.netloc:
    # if target on same path but lower than source url
    if parsedTarget.path.startswith(parsedSource.path):
      return parsedTarget.path.replace(parsedSource.path, '.')
    # on same path
    elif parsedTarget.path.rsplit('/', 1)[0] == parsedSource.path.rsplit('/', 1)[0]:
      return './' + parsedTarget.path.rsplit('/', 1)[1]
    # same netloc, varying paths
    else:
      path = ''
      upCount = 0
      for item in list(itertools.izip_longest(parsedSource.path.rsplit('/'), parsedTarget.path.rsplit('/'))):
        if item[0] == item[1]:
          pass
        else:
          if item[0] is not None:
            upCount += 1
          if item[1] is not None:
            path += item[1] + '/'
      return upCount * '../' + path
  else:
    return targetUrl


if __name__ == '__main__':
  '''
  "tests" :p
  '''
  url1 = 'http://coolwebsite.com/questions/bobobo/bo/bo/1663807/how-can-i-iterate-through-two-lists-in-parallel-in-python'
  url2 = 'http://coolwebsite.com/questions/126524/iterate-a-list-with-indexes-in-python'

  print url1
  print url2
  print 'second relative to second:'
  print makeRelativeUrl(url1, url2)

  url1 = 'http://coolwebsite.com/questions/1663807/how-can-i-iterate-through-two-lists-in-parallel-in-python'
  url2 = 'http://coolwebsite.com/questions/1663807/bananas'

  print url1
  print url2
  print 'second relative to first:'
  print makeRelativeUrl(url1, url2)

  url1 = 'http://coolwebsite.com/questions/1663807/fruits'
  url2 = 'http://coolwebsite.com/questions/1663807/fruits/berries/bananas'

  print url1
  print url2
  print 'second relative to first:'
  print makeRelativeUrl(url1, url2)

Run 'tests' to see if it works :P

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文