将一系列 1 和 0 压缩为尽可能短的 ascii 字符串

发布于 2024-11-06 04:11:37 字数 322 浏览 0 评论 0原文

如何将一系列 1 和 0 转换为由 URL 安全 ascii 字符组成的最短形式？

例如。

s = '00100101000101111010101'
compress(s)

结果如下：

Ysi8aaU

显然：

decompress(compress(s)) == s

（我问这个问题纯粹是出于好奇）

原文

How could you convert a series of 1s and 0s into the shortest possible form consisting of URL safe ascii characters?

eg.

s = '00100101000101111010101'
compress(s)

Resulting in something like:

Ysi8aaU

And obviously:

decompress(compress(s)) == s

(I ask this question purely out of curiousity)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

_畞蕅 2024-11-13 04:11:37

这是我想出的解决方案（+太多评论）：

# A set of 64 characters, which allows a maximum chunk length of 6 .. because
# int('111111', 2) == 63 (plus zero)
charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'

def encode(bin_string):
    # Split the string of 1s and 0s into lengths of 6.
    chunks = [bin_string[i:i+6] for i in range(0, len(bin_string), 6)]
    # Store the length of the last chunk so that we can add that as the last bit
    # of data so that we know how much to pad the last chunk when decoding.
    last_chunk_length = len(chunks[-1])
    # Convert each chunk from binary into a decimal
    decimals = [int(chunk, 2) for chunk in chunks]
    # Add the length of our last chunk to our list of decimals.
    decimals.append(last_chunk_length)
    # Produce an ascii string by using each decimal as an index of our charset.
    ascii_string = ''.join([charset[i] for i in decimals])

    return ascii_string

def decode(ascii_string):
    # Convert each character to a decimal using its index in the charset.
    decimals = [charset.index(char) for char in ascii_string]
    # Take last decimal which is the final chunk length, and the second to last
    # decimal which is the final chunk, and keep them for later to be padded
    # appropriately and appended.
    last_chunk_length, last_decimal = decimals.pop(-1), decimals.pop(-1)
    # Take each decimal, convert it to a binary string (removing the 0b from the
    # beginning, and pad it to 6 digits long.
    bin_string = ''.join([bin(decimal)[2:].zfill(6) for decimal in decimals])
    # Add the last decimal converted to binary padded to the appropriate length
    bin_string += bin(last_decimal)[2:].zfill(last_chunk_length)

    return bin_string

所以：

>>> bin_string = '000111000010101010101000101001110'
>>> encode(bin_string)
'hcQOPgd'
>>> decode(encode(bin_string))
'000111000010101010101000101001110'

这是在 CoffeeScript 中：

class Urlify
    constructor: ->
        @charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'

    encode: (bits) ->
        chunks = (bits[i...i+6] for i in [0...bits.length] by 6)
        last_chunk_length = chunks[chunks.length-1].length
        decimals = (parseInt(chunk, 2) for chunk in chunks)
        decimals.push(last_chunk_length)
        encoded = (@charset[i] for i in decimals).join('')

        return encoded

    decode: (encoded) ->
        decimals = (@charset.indexOf(char) for char in encoded)
        [last_chunk_length, last_decimal] = [decimals.pop(), decimals.pop()]
        decoded = (('00000'+d.toString(2)).slice(-6) for d in decimals).join('')
        last_chunk = ('00000'+last_decimal.toString(2)).slice(-last_chunk_length)
        decoded += last_chunk

        return decoded

Here's the solution I came up with (+ far too many comments):

# A set of 64 characters, which allows a maximum chunk length of 6 .. because
# int('111111', 2) == 63 (plus zero)
charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'

def encode(bin_string):
    # Split the string of 1s and 0s into lengths of 6.
    chunks = [bin_string[i:i+6] for i in range(0, len(bin_string), 6)]
    # Store the length of the last chunk so that we can add that as the last bit
    # of data so that we know how much to pad the last chunk when decoding.
    last_chunk_length = len(chunks[-1])
    # Convert each chunk from binary into a decimal
    decimals = [int(chunk, 2) for chunk in chunks]
    # Add the length of our last chunk to our list of decimals.
    decimals.append(last_chunk_length)
    # Produce an ascii string by using each decimal as an index of our charset.
    ascii_string = ''.join([charset[i] for i in decimals])

    return ascii_string

def decode(ascii_string):
    # Convert each character to a decimal using its index in the charset.
    decimals = [charset.index(char) for char in ascii_string]
    # Take last decimal which is the final chunk length, and the second to last
    # decimal which is the final chunk, and keep them for later to be padded
    # appropriately and appended.
    last_chunk_length, last_decimal = decimals.pop(-1), decimals.pop(-1)
    # Take each decimal, convert it to a binary string (removing the 0b from the
    # beginning, and pad it to 6 digits long.
    bin_string = ''.join([bin(decimal)[2:].zfill(6) for decimal in decimals])
    # Add the last decimal converted to binary padded to the appropriate length
    bin_string += bin(last_decimal)[2:].zfill(last_chunk_length)

    return bin_string

So:

>>> bin_string = '000111000010101010101000101001110'
>>> encode(bin_string)
'hcQOPgd'
>>> decode(encode(bin_string))
'000111000010101010101000101001110'

And here it is in CoffeeScript:

class Urlify
    constructor: ->
        @charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'

    encode: (bits) ->
        chunks = (bits[i...i+6] for i in [0...bits.length] by 6)
        last_chunk_length = chunks[chunks.length-1].length
        decimals = (parseInt(chunk, 2) for chunk in chunks)
        decimals.push(last_chunk_length)
        encoded = (@charset[i] for i in decimals).join('')

        return encoded

    decode: (encoded) ->
        decimals = (@charset.indexOf(char) for char in encoded)
        [last_chunk_length, last_decimal] = [decimals.pop(), decimals.pop()]
        decoded = (('00000'+d.toString(2)).slice(-6) for d in decimals).join('')
        last_chunk = ('00000'+last_decimal.toString(2)).slice(-last_chunk_length)
        decoded += last_chunk

        return decoded

回复收藏 0 原文

软糯酥胸 2024-11-13 04:11:37

正如其中一条评论提到的，使用 base64 可能是最好的选择。但是，您不想在不进行转换的情况下将二进制文件粘贴进去。

两个选项是先转换为 int 然后打包：

import base64

s = '0110110'
n = int(s, 2)

result = base64.urlsafe_b64encode(str(n)).rstrip('=')

另一个选项是使用 struct 模块将值打包为二进制格式并使用它。（下面的代码来自 http://www.fuyun.org/2009/10/how-to-convert-an-integer-to-base64-in-python/）

import base64
import struct

def encode(n):
  data = struct.pack('<Q', n).rstrip('\x00')
  if len(data)==0:
    data = '\x00'
  s = base64.urlsafe_b64encode(data).rstrip('=')
  return s

def decode(s):
  data = base64.urlsafe_b64decode(s + '==')
  n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
  return n[0]

As one of the comments mentioned, using base64 would probably be the way to go. However, you don't want to stick the binary in without some converting.

Two options are converting to int first then packing:

import base64

s = '0110110'
n = int(s, 2)

result = base64.urlsafe_b64encode(str(n)).rstrip('=')

The other option would be to use the struct module to pack the value into a binary format and use this. (The code below is from http://www.fuyun.org/2009/10/how-to-convert-an-integer-to-base64-in-python/)

import base64
import struct

def encode(n):
  data = struct.pack('<Q', n).rstrip('\x00')
  if len(data)==0:
    data = '\x00'
  s = base64.urlsafe_b64encode(data).rstrip('=')
  return s

def decode(s):
  data = base64.urlsafe_b64decode(s + '==')
  n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
  return n[0]

回复收藏 0 原文