将任意 GUID 编码为可读 ASCII (33-127) 的最有效方法是什么？

发布于 2024-09-01 10:02:49 字数 224 浏览 1 评论 0原文

GUID 的标准字符串表示形式大约需要 36 个字符。这非常好，但也非常浪费。我想知道如何使用 33-127 范围内的所有 ASCII 字符以最短的方式对其进行编码。天真的实现产生 22 个字符，仅仅是因为 128 位 / 6 位 产生 22。

霍夫曼编码是我的第二好，唯一的问题是如何选择代码...... ，编码必须是无损的。

当然

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

纸短情长 2024-09-08 10:02:49

这是一个老问题，但我必须解决它才能使我正在开发的系统向后兼容。

确切的要求是客户端生成的标识符将写入数据库并存储在 20 个字符的唯一列中。它从未向用户显示，也没有以任何方式建立索引。

由于我无法消除这个要求，所以我真的想使用 Guid （即统计上唯一），如果我可以将其无损编码为 20 个字符，那么考虑到限制，这将是一个很好的解决方案。

Ascii-85 允许您将 4 字节的二进制数据编码为 5 字节的 Ascii 数据。因此，使用此编码方案，16 字节 guid 正好适合 20 个 Ascii 字符。 Guid 可以有 3.1962657931507848761677563491821e+38 个离散值，而 Ascii-85 的 20 个字符可以有 3.8759531084514355873123178482056e+38 个离散值。

当写入数据库时，我对截断有一些担忧，因此编码中不包含空格字符。我还遇到了排序规则的问题，我通过从编码中排除小写字符来解决这个问题。此外，它只能通过参数化命令，因此任何特殊的 SQL 字符都会被自动转义。

我已经包含了执行 Ascii-85 编码和解码的 C# 代码，以防它对任何人有帮助。显然，根据您的使用情况，您可能需要选择不同的字符集，因为我的限制使我选择了一些不寻常的字符，例如“ß”和“Ø” - 但这是简单的部分：

/// <summary>
/// This code implements an encoding scheme that uses 85 printable ascii characters 
/// to encode the same volume of information as contained in a Guid.
/// 
/// Ascii-85 can represent 4 binary bytes as 5 Ascii bytes. So a 16 byte Guid can be 
/// represented in 20 Ascii bytes. A Guid can have 
/// 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of 
/// Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values.
/// 
/// Lower-case characters are not included in this encoding to avoid collation 
/// issues. 
/// This is a departure from standard Ascii-85 which does include lower case 
/// characters.
/// In addition, no whitespace characters are included as these may be truncated in 
/// the database depending on the storage mechanism - ie VARCHAR vs CHAR.
/// </summary>
internal static class Ascii85
{
    /// <summary>
    /// 85 printable ascii characters with no lower case ones, so database 
    /// collation can't bite us. No ' ' character either so database can't 
    /// truncate it!
    /// Unfortunately, these limitation mean resorting to some strange 
    /// characters like 'Æ' but we won't ever have to type these, so it's ok.
    /// </summary>
    private static readonly char[] kEncodeMap = new[]
    { 
        '0','1','2','3','4','5','6','7','8','9',  // 10
        'A','B','C','D','E','F','G','H','I','J',  // 20
        'K','L','M','N','O','P','Q','R','S','T',  // 30
        'U','V','W','X','Y','Z','|','}','~','{',  // 40
        '!','"','#','
另外，这里是单元测试。它们并不像我想要的那么彻底，而且我不喜欢使用 Guid.NewGuid() 的位置的不确定性，但它们应该让您开始：
/// <summary>
/// Tests to verify that the Ascii-85 encoding is functioning as expected.
/// </summary>
[TestClass]
[UsedImplicitly]
public class Ascii85Tests
{
    [TestMethod]
    [Description("Ensure that the Ascii-85 encoding is correct.")]
    [UsedImplicitly]
    public void CanEncodeAndDecodeAGuidUsingAscii85()
    {
        var guidStrings = new[]
        {
            "00000000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000000000FF",
            "00000000-0000-0000-0000-00000000FF00",
            "00000000-0000-0000-0000-000000FF0000",
            "00000000-0000-0000-0000-0000FF000000",
            "00000000-0000-0000-0000-00FF00000000",
            "00000000-0000-0000-0000-FF0000000000",
            "00000000-0000-0000-00FF-000000000000",
            "00000000-0000-0000-FF00-000000000000",
            "00000000-0000-00FF-0000-000000000000",
            "00000000-0000-FF00-0000-000000000000",
            "00000000-00FF-0000-0000-000000000000",
            "00000000-FF00-0000-0000-000000000000",
            "000000FF-0000-0000-0000-000000000000",
            "0000FF00-0000-0000-0000-000000000000",
            "00FF0000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-00000000FFFF",
            "00000000-0000-0000-0000-0000FFFF0000",
            "00000000-0000-0000-0000-FFFF00000000",
            "00000000-0000-0000-FFFF-000000000000",
            "00000000-0000-FFFF-0000-000000000000",
            "00000000-FFFF-0000-0000-000000000000",
            "0000FFFF-0000-0000-0000-000000000000",
            "FFFF0000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000FFFFFFFF",
            "00000000-0000-0000-FFFF-FFFF00000000",
            "00000000-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-0000-0000-0000-000000000000",
            "00000000-0000-0000-FFFF-FFFFFFFFFFFF",
            "FFFFFFFF-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF",
            "1000000F-100F-100F-100F-10000000000F"
        };

        foreach (var guidString in guidStrings)
        {
            var guid = new Guid(guidString);
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                20, 
                encoded.Length, 
                "A guid encoding should not exceed 20 characters.");

            var decoded = Ascii85.Decode(encoded);

            Assert.AreEqual(
                guid, 
                decoded, 
                "The guids are different after being encoded and decoded.");
        }
    }

    [TestMethod]
    [Description(
        "The Ascii-85 encoding is not susceptible to changes in character case.")]
    [UsedImplicitly]
    public void Ascii85IsCaseInsensitive()
    {
        const int kCount = 50;

        for (var i = 0; i < kCount; i++)
        {
            var guid = Guid.NewGuid();

            // The encoding should be all upper case. A reliance 
            // on mixed case will make the generated string 
            // vulnerable to sql collation.
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                encoded, 
                encoded.ToUpper(), 
                "The Ascii-85 encoding should produce only uppercase characters.");
        }
    }
}

我希望这可以节省有人遇到麻烦了。
另外，如果您发现任何错误，请告诉我;-)
,'%','&','\'','(',')','`', // 50
        '*','+',',','-','.','/','[','\\',']','^', // 60
        ':',';','<','=','>','?','@','_','¼','½',  // 70
        '¾','ß','Ç','Ð','€','«','»','¿','•','Ø',  // 80
        '£','†','‡','§','¥'                       // 85
    };

    /// <summary>
    /// A reverse mapping of the <see cref="kEncodeMap"/> array for decoding 
    /// purposes.
    /// </summary>
    private static readonly IDictionary<char, byte> kDecodeMap;

    /// <summary>
    /// Initialises the <see cref="kDecodeMap"/>.
    /// </summary>
    static Ascii85()
    {
        kDecodeMap = new Dictionary<char, byte>();

        for (byte i = 0; i < kEncodeMap.Length; i++)
        {
            kDecodeMap.Add(kEncodeMap[i], i);
        }
    }

    /// <summary>
    /// Decodes an Ascii-85 encoded Guid.
    /// </summary>
    /// <param name="ascii85Encoding">The Guid encoded using Ascii-85.</param>
    /// <returns>A Guid decoded from the parameter.</returns>
    public static Guid Decode(string ascii85Encoding)
    { 
        // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii.
        // Since a Guid is 16 bytes long, the Ascii-85 encoding should be 20
        // characters long.
        if(ascii85Encoding.Length != 20)
        {
            throw new ArgumentException(
                "An encoded Guid should be 20 characters long.", 
                "ascii85Encoding");
        }

        // We only support upper case characters.
        ascii85Encoding = ascii85Encoding.ToUpper();

        // Split the string in half and decode each substring separately.
        var higher = ascii85Encoding.Substring(0, 10).AsciiDecode();
        var lower = ascii85Encoding.Substring(10, 10).AsciiDecode();

        // Convert the decoded substrings into an array of 16-bytes.
        var byteArray = new[]
        {
            (byte)((higher & 0xFF00000000000000) >> 56),        
            (byte)((higher & 0x00FF000000000000) >> 48),        
            (byte)((higher & 0x0000FF0000000000) >> 40),        
            (byte)((higher & 0x000000FF00000000) >> 32),        
            (byte)((higher & 0x00000000FF000000) >> 24),        
            (byte)((higher & 0x0000000000FF0000) >> 16),        
            (byte)((higher & 0x000000000000FF00) >> 8),         
            (byte)((higher & 0x00000000000000FF)),  
            (byte)((lower  & 0xFF00000000000000) >> 56),        
            (byte)((lower  & 0x00FF000000000000) >> 48),        
            (byte)((lower  & 0x0000FF0000000000) >> 40),        
            (byte)((lower  & 0x000000FF00000000) >> 32),        
            (byte)((lower  & 0x00000000FF000000) >> 24),        
            (byte)((lower  & 0x0000000000FF0000) >> 16),        
            (byte)((lower  & 0x000000000000FF00) >> 8),         
            (byte)((lower  & 0x00000000000000FF)),  
        };

        return new Guid(byteArray);
    }

    /// <summary>
    /// Encodes binary data into a plaintext Ascii-85 format string.
    /// </summary>
    /// <param name="guid">The Guid to encode.</param>
    /// <returns>Ascii-85 encoded string</returns>
    public static string Encode(Guid guid)
    {
        // Convert the 128-bit Guid into two 64-bit parts.
        var byteArray = guid.ToByteArray();
        var higher = 
            ((UInt64)byteArray[0] << 56) | ((UInt64)byteArray[1] << 48) | 
            ((UInt64)byteArray[2] << 40) | ((UInt64)byteArray[3] << 32) |
            ((UInt64)byteArray[4] << 24) | ((UInt64)byteArray[5] << 16) | 
            ((UInt64)byteArray[6] << 8)  | byteArray[7];

        var lower = 
            ((UInt64)byteArray[ 8] << 56) | ((UInt64)byteArray[ 9] << 48) | 
            ((UInt64)byteArray[10] << 40) | ((UInt64)byteArray[11] << 32) |
            ((UInt64)byteArray[12] << 24) | ((UInt64)byteArray[13] << 16) | 
            ((UInt64)byteArray[14] << 8)  | byteArray[15];

        var encodedStringBuilder = new StringBuilder();

        // Encode each part into an ascii-85 encoded string.
        encodedStringBuilder.AsciiEncode(higher);
        encodedStringBuilder.AsciiEncode(lower);

        return encodedStringBuilder.ToString();
    }

    /// <summary>
    /// Encodes the given integer using Ascii-85.
    /// </summary>
    /// <param name="encodedStringBuilder">The <see cref="StringBuilder"/> to 
    /// append the results to.</param>
    /// <param name="part">The integer to encode.</param>
    private static void AsciiEncode(
        this StringBuilder encodedStringBuilder, UInt64 part)
    {
        // Nb, the most significant digits in our encoded character will 
        // be the right-most characters.
        var charCount = (UInt32)kEncodeMap.Length;

        // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii.
        // Since a UInt64 is 8 bytes long, the Ascii-85 encoding should be 
        // 10 characters long.
        for (var i = 0; i < 10; i++)
        {
            // Get the remainder when dividing by the base.
            var remainder = part % charCount;

            // Divide by the base.
            part /= charCount;

            // Add the appropriate character for the current value (0-84).
            encodedStringBuilder.Append(kEncodeMap[remainder]);
        }
    }

    /// <summary>
    /// Decodes the given string from Ascii-85 to an integer.
    /// </summary>
    /// <param name="ascii85EncodedString">Decodes a 10 character Ascii-85 
    /// encoded string.</param>
    /// <returns>The integer representation of the parameter.</returns>
    private static UInt64 AsciiDecode(this string ascii85EncodedString)
    {
        if (ascii85EncodedString.Length != 10)
        {
            throw new ArgumentException(
                "An Ascii-85 encoded Uint64 should be 10 characters long.", 
                "ascii85EncodedString");
        }

        // Nb, the most significant digits in our encoded character 
        // will be the right-most characters.
        var charCount = (UInt32)kEncodeMap.Length;
        UInt64 result = 0;

        // Starting with the right-most (most-significant) character, 
        // iterate through the encoded string and decode.
        for (var i = ascii85EncodedString.Length - 1; i >= 0; i--)
        {
            // Multiply the current decoded value by the base.
            result *= charCount;

            // Add the integer value for that encoded character.
            result += kDecodeMap[ascii85EncodedString[i]];
        }

        return result;
    }
}

另外，这里是单元测试。它们并不像我想要的那么彻底，而且我不喜欢使用 Guid.NewGuid() 的位置的不确定性，但它们应该让您开始：

我希望这可以节省有人遇到麻烦了。

另外，如果您发现任何错误，请告诉我;-)

This is an old question, but I had to solve it in order for a system I was working on to be backward compatible.

The exact requirement was for a client-generated identifier that would be written to the database and stored in a 20-character unique column. It never got shown to the user and was not indexed in any way.

Since I couldn't eliminate the requirement, I really wanted to use a Guid (which is statistically unique) and if I could encode it losslessly into 20 characters, then it would be a good solution given the constraints.

Ascii-85 allows you to encode 4 bytes of binary data into 5 bytes of Ascii data. So a 16 byte guid will just fit into 20 Ascii characters using this encoding scheme. A Guid can have 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values.

When writing to the database I had some concerns about truncation so no whitespace characters are included in the encoding. I also had issues with collation, which I addressed by excluding lowercase characters from the encoding. Also, it would only ever be passed through a paramaterized command, so any special SQL characters would be escaped automatically.

I've included the C# code to perform Ascii-85 encoding and decoding in case it helps anyone out there. Obviously, depending on your usage you might need to choose a different character set as my constraints made me choose some unusual characters like 'ß' and 'Ø' - but that's the easy part:

/// <summary>
/// This code implements an encoding scheme that uses 85 printable ascii characters 
/// to encode the same volume of information as contained in a Guid.
/// 
/// Ascii-85 can represent 4 binary bytes as 5 Ascii bytes. So a 16 byte Guid can be 
/// represented in 20 Ascii bytes. A Guid can have 
/// 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of 
/// Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values.
/// 
/// Lower-case characters are not included in this encoding to avoid collation 
/// issues. 
/// This is a departure from standard Ascii-85 which does include lower case 
/// characters.
/// In addition, no whitespace characters are included as these may be truncated in 
/// the database depending on the storage mechanism - ie VARCHAR vs CHAR.
/// </summary>
internal static class Ascii85
{
    /// <summary>
    /// 85 printable ascii characters with no lower case ones, so database 
    /// collation can't bite us. No ' ' character either so database can't 
    /// truncate it!
    /// Unfortunately, these limitation mean resorting to some strange 
    /// characters like 'Æ' but we won't ever have to type these, so it's ok.
    /// </summary>
    private static readonly char[] kEncodeMap = new[]
    { 
        '0','1','2','3','4','5','6','7','8','9',  // 10
        'A','B','C','D','E','F','G','H','I','J',  // 20
        'K','L','M','N','O','P','Q','R','S','T',  // 30
        'U','V','W','X','Y','Z','|','}','~','{',  // 40
        '!','"','#','
Also, here are the unit tests. They aren't as thorough as I'd like, and I don't like the non-determinism of where Guid.NewGuid() is used, but they should get you started:
/// <summary>
/// Tests to verify that the Ascii-85 encoding is functioning as expected.
/// </summary>
[TestClass]
[UsedImplicitly]
public class Ascii85Tests
{
    [TestMethod]
    [Description("Ensure that the Ascii-85 encoding is correct.")]
    [UsedImplicitly]
    public void CanEncodeAndDecodeAGuidUsingAscii85()
    {
        var guidStrings = new[]
        {
            "00000000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000000000FF",
            "00000000-0000-0000-0000-00000000FF00",
            "00000000-0000-0000-0000-000000FF0000",
            "00000000-0000-0000-0000-0000FF000000",
            "00000000-0000-0000-0000-00FF00000000",
            "00000000-0000-0000-0000-FF0000000000",
            "00000000-0000-0000-00FF-000000000000",
            "00000000-0000-0000-FF00-000000000000",
            "00000000-0000-00FF-0000-000000000000",
            "00000000-0000-FF00-0000-000000000000",
            "00000000-00FF-0000-0000-000000000000",
            "00000000-FF00-0000-0000-000000000000",
            "000000FF-0000-0000-0000-000000000000",
            "0000FF00-0000-0000-0000-000000000000",
            "00FF0000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-00000000FFFF",
            "00000000-0000-0000-0000-0000FFFF0000",
            "00000000-0000-0000-0000-FFFF00000000",
            "00000000-0000-0000-FFFF-000000000000",
            "00000000-0000-FFFF-0000-000000000000",
            "00000000-FFFF-0000-0000-000000000000",
            "0000FFFF-0000-0000-0000-000000000000",
            "FFFF0000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000FFFFFFFF",
            "00000000-0000-0000-FFFF-FFFF00000000",
            "00000000-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-0000-0000-0000-000000000000",
            "00000000-0000-0000-FFFF-FFFFFFFFFFFF",
            "FFFFFFFF-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF",
            "1000000F-100F-100F-100F-10000000000F"
        };

        foreach (var guidString in guidStrings)
        {
            var guid = new Guid(guidString);
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                20, 
                encoded.Length, 
                "A guid encoding should not exceed 20 characters.");

            var decoded = Ascii85.Decode(encoded);

            Assert.AreEqual(
                guid, 
                decoded, 
                "The guids are different after being encoded and decoded.");
        }
    }

    [TestMethod]
    [Description(
        "The Ascii-85 encoding is not susceptible to changes in character case.")]
    [UsedImplicitly]
    public void Ascii85IsCaseInsensitive()
    {
        const int kCount = 50;

        for (var i = 0; i < kCount; i++)
        {
            var guid = Guid.NewGuid();

            // The encoding should be all upper case. A reliance 
            // on mixed case will make the generated string 
            // vulnerable to sql collation.
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                encoded, 
                encoded.ToUpper(), 
                "The Ascii-85 encoding should produce only uppercase characters.");
        }
    }
}

I hope this saves somebody some trouble.
Also, if you find any bugs then let me know ;-)
,'%','&','\'','(',')','`', // 50
        '*','+',',','-','.','/','[','\\',']','^', // 60
        ':',';','<','=','>','?','@','_','¼','½',  // 70
        '¾','ß','Ç','Ð','€','«','»','¿','•','Ø',  // 80
        '£','†','‡','§','¥'                       // 85
    };

    /// <summary>
    /// A reverse mapping of the <see cref="kEncodeMap"/> array for decoding 
    /// purposes.
    /// </summary>
    private static readonly IDictionary<char, byte> kDecodeMap;

    /// <summary>
    /// Initialises the <see cref="kDecodeMap"/>.
    /// </summary>
    static Ascii85()
    {
        kDecodeMap = new Dictionary<char, byte>();

        for (byte i = 0; i < kEncodeMap.Length; i++)
        {
            kDecodeMap.Add(kEncodeMap[i], i);
        }
    }

    /// <summary>
    /// Decodes an Ascii-85 encoded Guid.
    /// </summary>
    /// <param name="ascii85Encoding">The Guid encoded using Ascii-85.</param>
    /// <returns>A Guid decoded from the parameter.</returns>
    public static Guid Decode(string ascii85Encoding)
    { 
        // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii.
        // Since a Guid is 16 bytes long, the Ascii-85 encoding should be 20
        // characters long.
        if(ascii85Encoding.Length != 20)
        {
            throw new ArgumentException(
                "An encoded Guid should be 20 characters long.", 
                "ascii85Encoding");
        }

        // We only support upper case characters.
        ascii85Encoding = ascii85Encoding.ToUpper();

        // Split the string in half and decode each substring separately.
        var higher = ascii85Encoding.Substring(0, 10).AsciiDecode();
        var lower = ascii85Encoding.Substring(10, 10).AsciiDecode();

        // Convert the decoded substrings into an array of 16-bytes.
        var byteArray = new[]
        {
            (byte)((higher & 0xFF00000000000000) >> 56),        
            (byte)((higher & 0x00FF000000000000) >> 48),        
            (byte)((higher & 0x0000FF0000000000) >> 40),        
            (byte)((higher & 0x000000FF00000000) >> 32),        
            (byte)((higher & 0x00000000FF000000) >> 24),        
            (byte)((higher & 0x0000000000FF0000) >> 16),        
            (byte)((higher & 0x000000000000FF00) >> 8),         
            (byte)((higher & 0x00000000000000FF)),  
            (byte)((lower  & 0xFF00000000000000) >> 56),        
            (byte)((lower  & 0x00FF000000000000) >> 48),        
            (byte)((lower  & 0x0000FF0000000000) >> 40),        
            (byte)((lower  & 0x000000FF00000000) >> 32),        
            (byte)((lower  & 0x00000000FF000000) >> 24),        
            (byte)((lower  & 0x0000000000FF0000) >> 16),        
            (byte)((lower  & 0x000000000000FF00) >> 8),         
            (byte)((lower  & 0x00000000000000FF)),  
        };

        return new Guid(byteArray);
    }

    /// <summary>
    /// Encodes binary data into a plaintext Ascii-85 format string.
    /// </summary>
    /// <param name="guid">The Guid to encode.</param>
    /// <returns>Ascii-85 encoded string</returns>
    public static string Encode(Guid guid)
    {
        // Convert the 128-bit Guid into two 64-bit parts.
        var byteArray = guid.ToByteArray();
        var higher = 
            ((UInt64)byteArray[0] << 56) | ((UInt64)byteArray[1] << 48) | 
            ((UInt64)byteArray[2] << 40) | ((UInt64)byteArray[3] << 32) |
            ((UInt64)byteArray[4] << 24) | ((UInt64)byteArray[5] << 16) | 
            ((UInt64)byteArray[6] << 8)  | byteArray[7];

        var lower = 
            ((UInt64)byteArray[ 8] << 56) | ((UInt64)byteArray[ 9] << 48) | 
            ((UInt64)byteArray[10] << 40) | ((UInt64)byteArray[11] << 32) |
            ((UInt64)byteArray[12] << 24) | ((UInt64)byteArray[13] << 16) | 
            ((UInt64)byteArray[14] << 8)  | byteArray[15];

        var encodedStringBuilder = new StringBuilder();

        // Encode each part into an ascii-85 encoded string.
        encodedStringBuilder.AsciiEncode(higher);
        encodedStringBuilder.AsciiEncode(lower);

        return encodedStringBuilder.ToString();
    }

    /// <summary>
    /// Encodes the given integer using Ascii-85.
    /// </summary>
    /// <param name="encodedStringBuilder">The <see cref="StringBuilder"/> to 
    /// append the results to.</param>
    /// <param name="part">The integer to encode.</param>
    private static void AsciiEncode(
        this StringBuilder encodedStringBuilder, UInt64 part)
    {
        // Nb, the most significant digits in our encoded character will 
        // be the right-most characters.
        var charCount = (UInt32)kEncodeMap.Length;

        // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii.
        // Since a UInt64 is 8 bytes long, the Ascii-85 encoding should be 
        // 10 characters long.
        for (var i = 0; i < 10; i++)
        {
            // Get the remainder when dividing by the base.
            var remainder = part % charCount;

            // Divide by the base.
            part /= charCount;

            // Add the appropriate character for the current value (0-84).
            encodedStringBuilder.Append(kEncodeMap[remainder]);
        }
    }

    /// <summary>
    /// Decodes the given string from Ascii-85 to an integer.
    /// </summary>
    /// <param name="ascii85EncodedString">Decodes a 10 character Ascii-85 
    /// encoded string.</param>
    /// <returns>The integer representation of the parameter.</returns>
    private static UInt64 AsciiDecode(this string ascii85EncodedString)
    {
        if (ascii85EncodedString.Length != 10)
        {
            throw new ArgumentException(
                "An Ascii-85 encoded Uint64 should be 10 characters long.", 
                "ascii85EncodedString");
        }

        // Nb, the most significant digits in our encoded character 
        // will be the right-most characters.
        var charCount = (UInt32)kEncodeMap.Length;
        UInt64 result = 0;

        // Starting with the right-most (most-significant) character, 
        // iterate through the encoded string and decode.
        for (var i = ascii85EncodedString.Length - 1; i >= 0; i--)
        {
            // Multiply the current decoded value by the base.
            result *= charCount;

            // Add the integer value for that encoded character.
            result += kDecodeMap[ascii85EncodedString[i]];
        }

        return result;
    }
}

Also, here are the unit tests. They aren't as thorough as I'd like, and I don't like the non-determinism of where Guid.NewGuid() is used, but they should get you started:

I hope this saves somebody some trouble.

Also, if you find any bugs then let me know ;-)

回复收藏 0 原文

叫嚣ゝ 2024-09-08 10:02:49

使用 85 基数。
参见第 4.1 节。 为什么是 85？ IPv6 地址的紧凑表示

IPv6 地址（如 GUID）由 8 个 16 位片段组成。

回复收藏 0 原文

简单气质女生网名 2024-09-08 10:02:49

您有 95 个可用字符 - 因此，多于 6 位，但少于 7 位（实际上约为 6.57 位）。您可以使用 128/log2(95) = 大约 19.48 个字符来编码为 20 个字符。如果以编码形式保存 2 个字符值得您损失可读性，则类似于（伪代码）：

char encoded[21];
long long guid;    // 128 bits number

for(int i=0; i<20; ++i) {
  encoded[i] = chr(guid % 95 + 33);
  guid /= 95;
}
encoded[20] = chr(0);

这基本上是通用的“以某种基数编码数字”代码，只不过不需要反转“数字”，因为无论如何，顺序是任意的（小尾数法更直接、更自然）。要从编码字符串中获取 guid，以非常相似的方式进行以 95 为底的多项式计算（当然是在从每个数字中减去 33 之后）：

guid = 0;

for(int i=0; i<20; ++i) {
  guid *= 95;
  guid += ord(encoded[i]) - 33;
}

本质上是使用 Horner 的多项式计算方法。

You have 95 characters available -- so, more than 6 bits, but not quite as many as 7 (about 6.57 actually). You could use 128/log2(95) = about 19.48 characters, to encode into 20 characters. If saving 2 characters in the encoded form is worth the loss of readability to you, something like (pseudocode):

char encoded[21];
long long guid;    // 128 bits number

for(int i=0; i<20; ++i) {
  encoded[i] = chr(guid % 95 + 33);
  guid /= 95;
}
encoded[20] = chr(0);

which is basically the generic "encode a number in some base" code, except that there's no need to reverse the "digits" since the order's arbitrary anyway (and little-endian is more direct and natural). To get back the guid from the encoded string is, in a very similar way, the polynomial computation in base 95 (after subtracting 33 from each digit of course):

guid = 0;

for(int i=0; i<20; ++i) {
  guid *= 95;
  guid += ord(encoded[i]) - 33;
}

essentially using Horner's approach to polynomial evaluation.

回复收藏 0 原文

一念一轮回 2024-09-08 10:02:49

只需转到Base64。

回复收藏 0 原文

[浮城] 2024-09-08 10:02:49

使用从 33（顺便说一句，空格有什么问题吗？）到 127 的完整范围，可以得到 95 个可能的字符。以 95 为基数表示 guid 的 2^128 可能值将使用 20 个字符。这是你能做的最好的事情（模数的事情，比如丢弃恒定的半字节）。省去麻烦 - 使用 base 64。

回复收藏 0 原文

近箐 2024-09-08 10:02:49

假设您的所有 GUID 均由相同算法生成，则在应用任何其他编码之前，您可以通过不对算法半字节进行编码来节省 4 位：-|

回复收藏 0 原文

惜醉颜 2024-09-08 10:02:49

任意 GUID？ “朴素”的算法将产生最佳结果。进一步压缩 GUID 的唯一方法是利用“任意”约束排除的数据中的模式。

回复收藏 0 原文

那些过往 2024-09-08 10:02:49

我同意 Base64 的方法。它将把 32 个字母的 UUID 缩减为 22 个字母的 Base64。

这是简单的十六进制 <-> PHP 的 Base64 转换函数：

function hex_to_base64($hex){
  $return = '';
  foreach(str_split($hex, 2) as $pair){
    $return .= chr(hexdec($pair));
  }
  return preg_replace("/=+$/", "", base64_encode($return)); // remove the trailing = sign, not needed for decoding in PHP.
}

function base64_to_hex($base64) {
  $return = '';
  foreach (str_split(base64_decode($base64), 1) as $char) {
      $return .= str_pad(dechex(ord($char)), 2, "0", STR_PAD_LEFT);
  }
  return $return;
}

I agree with the Base64 approach. It will cut back a 32-letter UUID to 22-letter Base64.

Here are simple Hex <-> Base64 converting functions for PHP:

function hex_to_base64($hex){
  $return = '';
  foreach(str_split($hex, 2) as $pair){
    $return .= chr(hexdec($pair));
  }
  return preg_replace("/=+$/", "", base64_encode($return)); // remove the trailing = sign, not needed for decoding in PHP.
}

function base64_to_hex($base64) {
  $return = '';
  foreach (str_split(base64_decode($base64), 1) as $char) {
      $return .= str_pad(dechex(ord($char)), 2, "0", STR_PAD_LEFT);
  }
  return $return;
}

回复收藏 0 原文

~没有更多了~