如何用Python制作独特的短网址?

发布于 2024-08-06 09:21:37 字数 266 浏览 16 评论 0原文

如何在Python中创建唯一的URL https://i.sstatic.net/turb6.jpg 或 http://tumblr.com/xzh3bi25y 当使用 python 中的 uuid 时,我得到一个非常大的 uuid。我想要更短的 URL。

How can I make unique URL in Python a la https://i.sstatic.net/turb6.jpg or http://tumblr.com/xzh3bi25y
When using uuid from python I get a very large one. I want something shorter for URLs.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

丿*梦醉红颜 2024-08-13 09:21:37

编辑:在这里,我为您编写了一个模块。使用它。 http://code.activestate.com/recipes/576918/


从 1 开始计数保证简短、唯一的 URL。 /1、/2、/3 ...等。

在字母表中添加大写和小写字母将给出与您的问题中类似的 URL。而且您只是以 62 为基数而不是 10 为基数进行计数。

现在唯一的问题是 URL 是连续出现的。要解决这个问题,请阅读我对此问题的回答:

将增量整数范围映射到最大以 26 为基数的六位数字,但不可预测

基本上,该方法是简单地交换增量值中的位,以给出随机性的外观,同时保持确定性并保证你没有任何碰撞。

Edit: Here, I wrote a module for you. Use it. http://code.activestate.com/recipes/576918/


Counting up from 1 will guarantee short, unique URLS. /1, /2, /3 ... etc.

Adding uppercase and lowercase letters to your alphabet will give URLs like those in your question. And you're just counting in base-62 instead of base-10.

Now the only problem is that the URLs come consecutively. To fix that, read my answer to this question here:

Map incrementing integer range to six-digit base 26 max, but unpredictably

Basically the approach is to simply swap bits around in the incrementing value to give the appearance of randomness while maintaining determinism and guaranteeing that you don't have any collisions.

奈何桥上唱咆哮 2024-08-13 09:21:37

我不确定大多数 URL 缩短器是否使用随机字符串。我的印象是他们将 URL 写入数据库,然后使用新记录的整数 ID 作为短 URL,编码基数为 36 或 62(字母+数字)。

将 int 转换为任意基数的字符串的 Python 代码位于此处

I'm not sure most URL shorteners use a random string. My impression is they write the URL to a database, then use the integer ID of the new record as the short URL, encoded base 36 or 62 (letters+digits).

Python code to convert an int to a string in arbitrary bases is here.

遇到 2024-08-13 09:21:37

Python 的 short_url 非常棒。

这是一个示例:

import short_url

id = 20  # your object id
domain = 'mytiny.domain' 

shortened_url = "http://{}/{}".format(
                                     domain,
                                     short_url.encode_url(id)
                               )

并解码代码:

decoded_id = short_url.decode_url(param)

就是这样:)

希望这会有所帮助。

Python's short_url is awesome.

Here is an example:

import short_url

id = 20  # your object id
domain = 'mytiny.domain' 

shortened_url = "http://{}/{}".format(
                                     domain,
                                     short_url.encode_url(id)
                               )

And to decode the code:

decoded_id = short_url.decode_url(param)

That's it :)

Hope this will help.

幻梦 2024-08-13 09:21:37

Hashids 是一个很棒的工具。

编辑:

以下是如何使用 Hashids 通过 Python 生成唯一的短 URL:

from hashids import Hashids

pk = 123 # Your object's id
domain = 'imgur.com' # Your domain

hashids = Hashids(salt='this is my salt', min_length=6)
link_id = hashids.encode(pk)
url = 'http://{domain}/{link_id}'.format(domain=domain, link_id=link_id)

Hashids is an awesome tool for this.

Edit:

Here's how to use Hashids to generate a unique short URL with Python:

from hashids import Hashids

pk = 123 # Your object's id
domain = 'imgur.com' # Your domain

hashids = Hashids(salt='this is my salt', min_length=6)
link_id = hashids.encode(pk)
url = 'http://{domain}/{link_id}'.format(domain=domain, link_id=link_id)
心如荒岛 2024-08-13 09:21:37

该模块将执行您想要的操作,保证字符串全局唯一(它是 UUID):

http ://pypi.python.org/pypi/shortuuid/0.1

如果您需要更短的内容,您应该能够将其截断到所需的长度,并且仍然得到可以合理避免冲突的内容。

This module will do what you want, guaranteeing that the string is globally unique (it is a UUID):

http://pypi.python.org/pypi/shortuuid/0.1

If you need something shorter, you should be able to truncate it to the desired length and still get something that will reasonably probably avoid clashes.

吝吻 2024-08-13 09:21:37

这个答案来得很晚,但当我计划创建一个 URL 缩短器项目时,我偶然发现了这个问题。现在我已经实现了一个功能齐全的 URL 缩短器(源代码位于 amitt0​​01/pygmy),我在这里添加一个答案对于其他人。

任何 URL 缩短器背后的基本原理都是从长 URL 中获取 int,然后使用 base62(base32 等)编码将此 int 转换为更具可读性的短 URL。

这个 int 是如何生成的?

大多数 URL 缩短器使用一些自动增量数据存储来将 URL 添加到数据存储,并使用自动增量 id 来获取 int 的 Base62 编码。

字符串程序中的 Base62 编码示例:

# Base-62 hash

import string
import time

_BASE = 62


class HashDigest:
    """Base base 62 hash library."""

    def __init__(self):
        self.base = string.ascii_letters + string.digits
        self.short_str = ''

    def encode(self, j):
        """Returns the repeated div mod of the number.
        :param j: int
        :return: list
        """
        if j == 0:
            return [j]
        r = []
        dividend = j
        while dividend > 0:
            dividend, remainder = divmod(dividend, _BASE)
            r.append(remainder)
        r = list(reversed(r))
        return r

    def shorten(self, i):
        """
        :param i:
        :return: str
        """
        self.short_str = ""
        encoded_list = self.encode(i)
        for val in encoded_list:
            self.short_str += self.base[val]
        return self.short_str

这只是显示 Base62 编码的部分代码。查看完整的base62编码/解码代码 core/hashdigest.py

此答案中的所有链接从我创建的项目中缩短

This answer comes pretty late but I stumbled upon this question when I was planning to create an URL shortener project. Now that I have implemented a fully functional URL shortener(source code at amitt001/pygmy) I am adding an answer here for others.

The basic principle behind any URL shortener is to get an int from long URL then use base62(base32, etc) encoding to convert this int to a more readable short URL.

How is this int generated?

Most of the URL shortener uses some auto-incrementing datastore to add URL to datastore and use the autoincrement id to get base62 encoding of int.

The sample base62 encoding from string program:

# Base-62 hash

import string
import time

_BASE = 62


class HashDigest:
    """Base base 62 hash library."""

    def __init__(self):
        self.base = string.ascii_letters + string.digits
        self.short_str = ''

    def encode(self, j):
        """Returns the repeated div mod of the number.
        :param j: int
        :return: list
        """
        if j == 0:
            return [j]
        r = []
        dividend = j
        while dividend > 0:
            dividend, remainder = divmod(dividend, _BASE)
            r.append(remainder)
        r = list(reversed(r))
        return r

    def shorten(self, i):
        """
        :param i:
        :return: str
        """
        self.short_str = ""
        encoded_list = self.encode(i)
        for val in encoded_list:
            self.short_str += self.base[val]
        return self.short_str

This is just the partial code showing base62 encoding. Check out the complete base62 encoding/decoding code at core/hashdigest.py

All the link in this answer are shortened from the project I created

最佳男配角 2024-08-13 09:21:37

UUID之所以长,是因为它们包含大量信息,这样可以保证它们是全局唯一的。

如果你想要更短的东西,那么你需要做一些事情,比如生成一个随机字符串,检查它是否在已经生成的字符串的范围内,然后重复直到得到一个未使用的字符串。您还需要注意此处的并发性(如果在插入字符串集之前由单独的进程生成相同的字符串怎么办?)。

如果您需要在 Python 中生成随机字符串的帮助,请参阅这个其他问题 可能有帮助。

The reason UUIDs are long is because they contain lots of information so that they can be guaranteed to be globally unique.

If you want something shorter, then you'll need to do something like generate a random string, checking whether it is in the universe of already generated strings, and repeating until you get an unused string. You'll also need to watch out for concurrency here (what if the same string gets generated by a separate process before you inserted into the set of strings?).

If you need some help generating random strings in Python, this other question might help.

べ繥欢鉨o。 2024-08-13 09:21:37

这是Python并不重要,但你只需要一个映射到你想要的长度的哈希函数。例如,可以使用 MD5,然后仅获取前 n 个字符。不过,在这种情况下,您必须注意碰撞,因此您可能需要选择在碰撞检测方面更强大的东西(例如使用素数在哈希字符串的空间中循环)。

It doesn't really matter that this is Python, but you just need a hash function that maps to the length you want. For example, maybe use MD5 and then take just the first n characters. You'll have to watch out for collisions in that case, though, so you might want to pick something a little more robust in terms of collision detection (like using primes to cycle through the space of hash strings).

寄与心 2024-08-13 09:21:37

我不知道你是否可以使用这个,但我们在 Zope 中生成内容对象,这些对象根据当前时间字符串获取唯一的数字 ID,以毫秒为单位(例如,1254298969501)

也许你可以猜到其余的。使用此处描述的配方:
如何将整数转换为Python 中最短的 url 安全字符串?,我们动态编码和解码真实 id,无需存储。例如,13 位整数在基数 62 中减少为 7 个字母数字字符。

为了完成实施,我们注册了一个短(xxx.yy)域名,它对“未找到”URL 进行解码并执行 301 重定向,

如果我重新开始,我会减去“重新开始”时间(以毫秒为单位) )从编码之前的数字id,然后在解码时重新添加它。或者在生成对象时。任何。这样就更短了..

I don't know if you can use this, but we generate content objects in Zope that get unique numeric ids based on current time strings, in millis (eg, 1254298969501)

Maybe you can guess the rest. Using the recipe described here:
How to convert an integer to the shortest url-safe string in Python?, we encode and decode the real id on the fly, with no need for storage. A 13-digit integer is reduced to 7 alphanumeric chars in base 62, for example.

To complete the implementation, we registered a short (xxx.yy) domain name, that decodes and does a 301 redirect for "not found" URLs,

If I was starting over, I would subtract the "starting-over" time (in millis) from the numeric id prior to encoding, then re-add it when decoding. Or else when generating the objects. Whatever. That would be way shorter..

蒲公英的约定 2024-08-13 09:21:37

可以生成N个随机字符串:

import string
import random

def short_random_string(N:int) -> str:

    return ''.join(random.SystemRandom().choice(
        string.ascii_letters + \
        string.digits) for _ in range(N)
    )

所以,

print (short_random_string(10) )
#'G1ZRbouk2U'

全部小写

print (short_random_string(10).lower() )
#'pljh6kp328'

You can generate a N random string:

import string
import random

def short_random_string(N:int) -> str:

    return ''.join(random.SystemRandom().choice(
        string.ascii_letters + \
        string.digits) for _ in range(N)
    )

so,

print (short_random_string(10) )
#'G1ZRbouk2U'

all lowercase

print (short_random_string(10).lower() )
#'pljh6kp328'
自在安然 2024-08-13 09:21:37

试试这个 http://code.google.com/p/tiny4py/ ...仍在开发中,但非常有用!

Try this http://code.google.com/p/tiny4py/ ... It's still under development, but very useful!!

深者入戏 2024-08-13 09:21:37

我的目标:生成指定固定长度的唯一标识符,由字符0-9az组成。例如:

zcgst5od
9x2zgn0l
qa44sp0z
61vv1nl5
umpprkbt
ylg4lmcy
dec0lu1t
38mhd8i5
rx00yf0e
kc2qdc07

这是我的解决方案。 (改编自 此答案,作者:kmkaplan.)

import random

class IDGenerator(object):
    ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz"

    def __init__(self, length=8):
        self._alphabet_length = len(self.ALPHABET)
        self._id_length = length

    def _encode_int(self, n):
        # Adapted from:
        #   Source: https://stackoverflow.com/a/561809/1497596
        #   Author: https://stackoverflow.com/users/50902/kmkaplan

        encoded = ''
        while n > 0:
            n, r = divmod(n, self._alphabet_length)
            encoded = self.ALPHABET[r] + encoded
        return encoded

    def generate_id(self):
        """Generate an ID without leading zeros.

        For example, for an ID that is eight characters in length, the
        returned values will range from '10000000' to 'zzzzzzzz'.
        """

        start = self._alphabet_length**(self._id_length - 1)
        end = self._alphabet_length**self._id_length - 1
        return self._encode_int(random.randint(start, end))

if __name__ == "__main__":
    # Sample usage: Generate ten IDs each eight characters in length.
    idgen = IDGenerator(8)

    for i in range(10):
        print idgen.generate_id()

My Goal: Generate a unique identifier of a specified fixed length consisting of the characters 0-9 and a-z. For example:

zcgst5od
9x2zgn0l
qa44sp0z
61vv1nl5
umpprkbt
ylg4lmcy
dec0lu1t
38mhd8i5
rx00yf0e
kc2qdc07

Here's my solution. (Adapted from this answer by kmkaplan.)

import random

class IDGenerator(object):
    ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz"

    def __init__(self, length=8):
        self._alphabet_length = len(self.ALPHABET)
        self._id_length = length

    def _encode_int(self, n):
        # Adapted from:
        #   Source: https://stackoverflow.com/a/561809/1497596
        #   Author: https://stackoverflow.com/users/50902/kmkaplan

        encoded = ''
        while n > 0:
            n, r = divmod(n, self._alphabet_length)
            encoded = self.ALPHABET[r] + encoded
        return encoded

    def generate_id(self):
        """Generate an ID without leading zeros.

        For example, for an ID that is eight characters in length, the
        returned values will range from '10000000' to 'zzzzzzzz'.
        """

        start = self._alphabet_length**(self._id_length - 1)
        end = self._alphabet_length**self._id_length - 1
        return self._encode_int(random.randint(start, end))

if __name__ == "__main__":
    # Sample usage: Generate ten IDs each eight characters in length.
    idgen = IDGenerator(8)

    for i in range(10):
        print idgen.generate_id()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文