将任意 GUID 编码为可读 ASCII (33-127) 的最有效方法是什么?

发布于 2024-09-01 10:02:49 字数 224 浏览 1 评论 0原文

GUID 的标准字符串表示形式大约需要 36 个字符。这非常好,但也非常浪费。我想知道如何使用 33-127 范围内的所有 ASCII 字符以最短的方式对其进行编码。天真的实现产生 22 个字符,仅仅是因为 128 位 / 6 位 产生 22。

霍夫曼编码是我的第二好,唯一的问题是如何选择代码...... ,编码必须是无损的。

当然

The standard string representation of GUID takes about 36 characters. Which is very nice, but also really wasteful. I am wondering, how to encode it in the shortest possible way using all the ASCII characters in the range 33-127. The naive implementation produces 22 characters, simply because 128 bits / 6 bits yields 22.

Huffman encoding is my second best, the only question is how to choose the codes....

The encoding must be lossless, of course.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

纸短情长 2024-09-08 10:02:49

这是一个老问题,但我必须解决它才能使我正在开发的系统向后兼容。

确切的要求是客户端生成的标识符将写入数据库并存储在 20 个字符的唯一列中。它从未向用户显示,也没有以任何方式建立索引。

由于我无法消除这个要求,所以我真的想使用 Guid (即 统计上唯一),如果我可以将其无损编码为 20 个字符,那么考虑到限制,这将是一个很好的解决方案。

Ascii-85 允许您将 4 字节的二进制数据编码为 5 字节的 Ascii 数据。因此,使用此编码方案,16 字节 guid 正好适合 20 个 Ascii 字符。 Guid 可以有 3.1962657931507848761677563491821e+38 个离散值,而 Ascii-85 的 20 个字符可以有 3.8759531084514355873123178482056e+38 个离散值。

当写入数据库时​​,我对截断有一些担忧,因此编码中不包含空格字符。我还遇到了 排序规则 的问题,我通过从编码中排除小写字符来解决这个问题。此外,它只能通过参数化命令,因此任何特殊的 SQL 字符都会被自动转义。

我已经包含了执行 Ascii-85 编码和解码的 C# 代码,以防它对任何人有帮助。显然,根据您的使用情况,您可能需要选择不同的字符集,因为我的限制使我选择了一些不寻常的字符,例如“ß”和“Ø” - 但这是简单的部分:

/// <summary>
/// This code implements an encoding scheme that uses 85 printable ascii characters 
/// to encode the same volume of information as contained in a Guid.
/// 
/// Ascii-85 can represent 4 binary bytes as 5 Ascii bytes. So a 16 byte Guid can be 
/// represented in 20 Ascii bytes. A Guid can have 
/// 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of 
/// Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values.
/// 
/// Lower-case characters are not included in this encoding to avoid collation 
/// issues. 
/// This is a departure from standard Ascii-85 which does include lower case 
/// characters.
/// In addition, no whitespace characters are included as these may be truncated in 
/// the database depending on the storage mechanism - ie VARCHAR vs CHAR.
/// </summary>
internal static class Ascii85
{
    /// <summary>
    /// 85 printable ascii characters with no lower case ones, so database 
    /// collation can't bite us. No ' ' character either so database can't 
    /// truncate it!
    /// Unfortunately, these limitation mean resorting to some strange 
    /// characters like 'Æ' but we won't ever have to type these, so it's ok.
    /// </summary>
    private static readonly char[] kEncodeMap = new[]
    { 
        '0','1','2','3','4','5','6','7','8','9',  // 10
        'A','B','C','D','E','F','G','H','I','J',  // 20
        'K','L','M','N','O','P','Q','R','S','T',  // 30
        'U','V','W','X','Y','Z','|','}','~','{',  // 40
        '!','"','#','

另外,这里是单元测试。它们并不像我想要的那么彻底,而且我不喜欢使用 Guid.NewGuid() 的位置的不确定性,但它们应该让您开始:

/// <summary>
/// Tests to verify that the Ascii-85 encoding is functioning as expected.
/// </summary>
[TestClass]
[UsedImplicitly]
public class Ascii85Tests
{
    [TestMethod]
    [Description("Ensure that the Ascii-85 encoding is correct.")]
    [UsedImplicitly]
    public void CanEncodeAndDecodeAGuidUsingAscii85()
    {
        var guidStrings = new[]
        {
            "00000000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000000000FF",
            "00000000-0000-0000-0000-00000000FF00",
            "00000000-0000-0000-0000-000000FF0000",
            "00000000-0000-0000-0000-0000FF000000",
            "00000000-0000-0000-0000-00FF00000000",
            "00000000-0000-0000-0000-FF0000000000",
            "00000000-0000-0000-00FF-000000000000",
            "00000000-0000-0000-FF00-000000000000",
            "00000000-0000-00FF-0000-000000000000",
            "00000000-0000-FF00-0000-000000000000",
            "00000000-00FF-0000-0000-000000000000",
            "00000000-FF00-0000-0000-000000000000",
            "000000FF-0000-0000-0000-000000000000",
            "0000FF00-0000-0000-0000-000000000000",
            "00FF0000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-00000000FFFF",
            "00000000-0000-0000-0000-0000FFFF0000",
            "00000000-0000-0000-0000-FFFF00000000",
            "00000000-0000-0000-FFFF-000000000000",
            "00000000-0000-FFFF-0000-000000000000",
            "00000000-FFFF-0000-0000-000000000000",
            "0000FFFF-0000-0000-0000-000000000000",
            "FFFF0000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000FFFFFFFF",
            "00000000-0000-0000-FFFF-FFFF00000000",
            "00000000-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-0000-0000-0000-000000000000",
            "00000000-0000-0000-FFFF-FFFFFFFFFFFF",
            "FFFFFFFF-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF",
            "1000000F-100F-100F-100F-10000000000F"
        };

        foreach (var guidString in guidStrings)
        {
            var guid = new Guid(guidString);
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                20, 
                encoded.Length, 
                "A guid encoding should not exceed 20 characters.");

            var decoded = Ascii85.Decode(encoded);

            Assert.AreEqual(
                guid, 
                decoded, 
                "The guids are different after being encoded and decoded.");
        }
    }

    [TestMethod]
    [Description(
        "The Ascii-85 encoding is not susceptible to changes in character case.")]
    [UsedImplicitly]
    public void Ascii85IsCaseInsensitive()
    {
        const int kCount = 50;

        for (var i = 0; i < kCount; i++)
        {
            var guid = Guid.NewGuid();

            // The encoding should be all upper case. A reliance 
            // on mixed case will make the generated string 
            // vulnerable to sql collation.
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                encoded, 
                encoded.ToUpper(), 
                "The Ascii-85 encoding should produce only uppercase characters.");
        }
    }
}

我希望这可以节省有人遇到麻烦了。

另外,如果您发现任何错误,请告诉我;-)

,'%','&','\'','(',')','`', // 50 '*','+',',','-','.','/','[','\\',']','^', // 60 ':',';','<','=','>','?','@','_','¼','½', // 70 '¾','ß','Ç','Ð','€','«','»','¿','•','Ø', // 80 '£','†','‡','§','¥' // 85 }; /// <summary> /// A reverse mapping of the <see cref="kEncodeMap"/> array for decoding /// purposes. /// </summary> private static readonly IDictionary<char, byte> kDecodeMap; /// <summary> /// Initialises the <see cref="kDecodeMap"/>. /// </summary> static Ascii85() { kDecodeMap = new Dictionary<char, byte>(); for (byte i = 0; i < kEncodeMap.Length; i++) { kDecodeMap.Add(kEncodeMap[i], i); } } /// <summary> /// Decodes an Ascii-85 encoded Guid. /// </summary> /// <param name="ascii85Encoding">The Guid encoded using Ascii-85.</param> /// <returns>A Guid decoded from the parameter.</returns> public static Guid Decode(string ascii85Encoding) { // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii. // Since a Guid is 16 bytes long, the Ascii-85 encoding should be 20 // characters long. if(ascii85Encoding.Length != 20) { throw new ArgumentException( "An encoded Guid should be 20 characters long.", "ascii85Encoding"); } // We only support upper case characters. ascii85Encoding = ascii85Encoding.ToUpper(); // Split the string in half and decode each substring separately. var higher = ascii85Encoding.Substring(0, 10).AsciiDecode(); var lower = ascii85Encoding.Substring(10, 10).AsciiDecode(); // Convert the decoded substrings into an array of 16-bytes. var byteArray = new[] { (byte)((higher & 0xFF00000000000000) >> 56), (byte)((higher & 0x00FF000000000000) >> 48), (byte)((higher & 0x0000FF0000000000) >> 40), (byte)((higher & 0x000000FF00000000) >> 32), (byte)((higher & 0x00000000FF000000) >> 24), (byte)((higher & 0x0000000000FF0000) >> 16), (byte)((higher & 0x000000000000FF00) >> 8), (byte)((higher & 0x00000000000000FF)), (byte)((lower & 0xFF00000000000000) >> 56), (byte)((lower & 0x00FF000000000000) >> 48), (byte)((lower & 0x0000FF0000000000) >> 40), (byte)((lower & 0x000000FF00000000) >> 32), (byte)((lower & 0x00000000FF000000) >> 24), (byte)((lower & 0x0000000000FF0000) >> 16), (byte)((lower & 0x000000000000FF00) >> 8), (byte)((lower & 0x00000000000000FF)), }; return new Guid(byteArray); } /// <summary> /// Encodes binary data into a plaintext Ascii-85 format string. /// </summary> /// <param name="guid">The Guid to encode.</param> /// <returns>Ascii-85 encoded string</returns> public static string Encode(Guid guid) { // Convert the 128-bit Guid into two 64-bit parts. var byteArray = guid.ToByteArray(); var higher = ((UInt64)byteArray[0] << 56) | ((UInt64)byteArray[1] << 48) | ((UInt64)byteArray[2] << 40) | ((UInt64)byteArray[3] << 32) | ((UInt64)byteArray[4] << 24) | ((UInt64)byteArray[5] << 16) | ((UInt64)byteArray[6] << 8) | byteArray[7]; var lower = ((UInt64)byteArray[ 8] << 56) | ((UInt64)byteArray[ 9] << 48) | ((UInt64)byteArray[10] << 40) | ((UInt64)byteArray[11] << 32) | ((UInt64)byteArray[12] << 24) | ((UInt64)byteArray[13] << 16) | ((UInt64)byteArray[14] << 8) | byteArray[15]; var encodedStringBuilder = new StringBuilder(); // Encode each part into an ascii-85 encoded string. encodedStringBuilder.AsciiEncode(higher); encodedStringBuilder.AsciiEncode(lower); return encodedStringBuilder.ToString(); } /// <summary> /// Encodes the given integer using Ascii-85. /// </summary> /// <param name="encodedStringBuilder">The <see cref="StringBuilder"/> to /// append the results to.</param> /// <param name="part">The integer to encode.</param> private static void AsciiEncode( this StringBuilder encodedStringBuilder, UInt64 part) { // Nb, the most significant digits in our encoded character will // be the right-most characters. var charCount = (UInt32)kEncodeMap.Length; // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii. // Since a UInt64 is 8 bytes long, the Ascii-85 encoding should be // 10 characters long. for (var i = 0; i < 10; i++) { // Get the remainder when dividing by the base. var remainder = part % charCount; // Divide by the base. part /= charCount; // Add the appropriate character for the current value (0-84). encodedStringBuilder.Append(kEncodeMap[remainder]); } } /// <summary> /// Decodes the given string from Ascii-85 to an integer. /// </summary> /// <param name="ascii85EncodedString">Decodes a 10 character Ascii-85 /// encoded string.</param> /// <returns>The integer representation of the parameter.</returns> private static UInt64 AsciiDecode(this string ascii85EncodedString) { if (ascii85EncodedString.Length != 10) { throw new ArgumentException( "An Ascii-85 encoded Uint64 should be 10 characters long.", "ascii85EncodedString"); } // Nb, the most significant digits in our encoded character // will be the right-most characters. var charCount = (UInt32)kEncodeMap.Length; UInt64 result = 0; // Starting with the right-most (most-significant) character, // iterate through the encoded string and decode. for (var i = ascii85EncodedString.Length - 1; i >= 0; i--) { // Multiply the current decoded value by the base. result *= charCount; // Add the integer value for that encoded character. result += kDecodeMap[ascii85EncodedString[i]]; } return result; } }

另外,这里是单元测试。它们并不像我想要的那么彻底,而且我不喜欢使用 Guid.NewGuid() 的位置的不确定性,但它们应该让您开始:

我希望这可以节省有人遇到麻烦了。

另外,如果您发现任何错误,请告诉我;-)

This is an old question, but I had to solve it in order for a system I was working on to be backward compatible.

The exact requirement was for a client-generated identifier that would be written to the database and stored in a 20-character unique column. It never got shown to the user and was not indexed in any way.

Since I couldn't eliminate the requirement, I really wanted to use a Guid (which is statistically unique) and if I could encode it losslessly into 20 characters, then it would be a good solution given the constraints.

Ascii-85 allows you to encode 4 bytes of binary data into 5 bytes of Ascii data. So a 16 byte guid will just fit into 20 Ascii characters using this encoding scheme. A Guid can have 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values.

When writing to the database I had some concerns about truncation so no whitespace characters are included in the encoding. I also had issues with collation, which I addressed by excluding lowercase characters from the encoding. Also, it would only ever be passed through a paramaterized command, so any special SQL characters would be escaped automatically.

I've included the C# code to perform Ascii-85 encoding and decoding in case it helps anyone out there. Obviously, depending on your usage you might need to choose a different character set as my constraints made me choose some unusual characters like 'ß' and 'Ø' - but that's the easy part:

/// <summary>
/// This code implements an encoding scheme that uses 85 printable ascii characters 
/// to encode the same volume of information as contained in a Guid.
/// 
/// Ascii-85 can represent 4 binary bytes as 5 Ascii bytes. So a 16 byte Guid can be 
/// represented in 20 Ascii bytes. A Guid can have 
/// 3.1962657931507848761677563491821e+38 discrete values whereas 20 characters of 
/// Ascii-85 can have 3.8759531084514355873123178482056e+38 discrete values.
/// 
/// Lower-case characters are not included in this encoding to avoid collation 
/// issues. 
/// This is a departure from standard Ascii-85 which does include lower case 
/// characters.
/// In addition, no whitespace characters are included as these may be truncated in 
/// the database depending on the storage mechanism - ie VARCHAR vs CHAR.
/// </summary>
internal static class Ascii85
{
    /// <summary>
    /// 85 printable ascii characters with no lower case ones, so database 
    /// collation can't bite us. No ' ' character either so database can't 
    /// truncate it!
    /// Unfortunately, these limitation mean resorting to some strange 
    /// characters like 'Æ' but we won't ever have to type these, so it's ok.
    /// </summary>
    private static readonly char[] kEncodeMap = new[]
    { 
        '0','1','2','3','4','5','6','7','8','9',  // 10
        'A','B','C','D','E','F','G','H','I','J',  // 20
        'K','L','M','N','O','P','Q','R','S','T',  // 30
        'U','V','W','X','Y','Z','|','}','~','{',  // 40
        '!','"','#','

Also, here are the unit tests. They aren't as thorough as I'd like, and I don't like the non-determinism of where Guid.NewGuid() is used, but they should get you started:

/// <summary>
/// Tests to verify that the Ascii-85 encoding is functioning as expected.
/// </summary>
[TestClass]
[UsedImplicitly]
public class Ascii85Tests
{
    [TestMethod]
    [Description("Ensure that the Ascii-85 encoding is correct.")]
    [UsedImplicitly]
    public void CanEncodeAndDecodeAGuidUsingAscii85()
    {
        var guidStrings = new[]
        {
            "00000000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000000000FF",
            "00000000-0000-0000-0000-00000000FF00",
            "00000000-0000-0000-0000-000000FF0000",
            "00000000-0000-0000-0000-0000FF000000",
            "00000000-0000-0000-0000-00FF00000000",
            "00000000-0000-0000-0000-FF0000000000",
            "00000000-0000-0000-00FF-000000000000",
            "00000000-0000-0000-FF00-000000000000",
            "00000000-0000-00FF-0000-000000000000",
            "00000000-0000-FF00-0000-000000000000",
            "00000000-00FF-0000-0000-000000000000",
            "00000000-FF00-0000-0000-000000000000",
            "000000FF-0000-0000-0000-000000000000",
            "0000FF00-0000-0000-0000-000000000000",
            "00FF0000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-000000000000",
            "FF000000-0000-0000-0000-00000000FFFF",
            "00000000-0000-0000-0000-0000FFFF0000",
            "00000000-0000-0000-0000-FFFF00000000",
            "00000000-0000-0000-FFFF-000000000000",
            "00000000-0000-FFFF-0000-000000000000",
            "00000000-FFFF-0000-0000-000000000000",
            "0000FFFF-0000-0000-0000-000000000000",
            "FFFF0000-0000-0000-0000-000000000000",
            "00000000-0000-0000-0000-0000FFFFFFFF",
            "00000000-0000-0000-FFFF-FFFF00000000",
            "00000000-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-0000-0000-0000-000000000000",
            "00000000-0000-0000-FFFF-FFFFFFFFFFFF",
            "FFFFFFFF-FFFF-FFFF-0000-000000000000",
            "FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF",
            "1000000F-100F-100F-100F-10000000000F"
        };

        foreach (var guidString in guidStrings)
        {
            var guid = new Guid(guidString);
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                20, 
                encoded.Length, 
                "A guid encoding should not exceed 20 characters.");

            var decoded = Ascii85.Decode(encoded);

            Assert.AreEqual(
                guid, 
                decoded, 
                "The guids are different after being encoded and decoded.");
        }
    }

    [TestMethod]
    [Description(
        "The Ascii-85 encoding is not susceptible to changes in character case.")]
    [UsedImplicitly]
    public void Ascii85IsCaseInsensitive()
    {
        const int kCount = 50;

        for (var i = 0; i < kCount; i++)
        {
            var guid = Guid.NewGuid();

            // The encoding should be all upper case. A reliance 
            // on mixed case will make the generated string 
            // vulnerable to sql collation.
            var encoded = Ascii85.Encode(guid);

            Assert.AreEqual(
                encoded, 
                encoded.ToUpper(), 
                "The Ascii-85 encoding should produce only uppercase characters.");
        }
    }
}

I hope this saves somebody some trouble.

Also, if you find any bugs then let me know ;-)

,'%','&','\'','(',')','`', // 50 '*','+',',','-','.','/','[','\\',']','^', // 60 ':',';','<','=','>','?','@','_','¼','½', // 70 '¾','ß','Ç','Ð','€','«','»','¿','•','Ø', // 80 '£','†','‡','§','¥' // 85 }; /// <summary> /// A reverse mapping of the <see cref="kEncodeMap"/> array for decoding /// purposes. /// </summary> private static readonly IDictionary<char, byte> kDecodeMap; /// <summary> /// Initialises the <see cref="kDecodeMap"/>. /// </summary> static Ascii85() { kDecodeMap = new Dictionary<char, byte>(); for (byte i = 0; i < kEncodeMap.Length; i++) { kDecodeMap.Add(kEncodeMap[i], i); } } /// <summary> /// Decodes an Ascii-85 encoded Guid. /// </summary> /// <param name="ascii85Encoding">The Guid encoded using Ascii-85.</param> /// <returns>A Guid decoded from the parameter.</returns> public static Guid Decode(string ascii85Encoding) { // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii. // Since a Guid is 16 bytes long, the Ascii-85 encoding should be 20 // characters long. if(ascii85Encoding.Length != 20) { throw new ArgumentException( "An encoded Guid should be 20 characters long.", "ascii85Encoding"); } // We only support upper case characters. ascii85Encoding = ascii85Encoding.ToUpper(); // Split the string in half and decode each substring separately. var higher = ascii85Encoding.Substring(0, 10).AsciiDecode(); var lower = ascii85Encoding.Substring(10, 10).AsciiDecode(); // Convert the decoded substrings into an array of 16-bytes. var byteArray = new[] { (byte)((higher & 0xFF00000000000000) >> 56), (byte)((higher & 0x00FF000000000000) >> 48), (byte)((higher & 0x0000FF0000000000) >> 40), (byte)((higher & 0x000000FF00000000) >> 32), (byte)((higher & 0x00000000FF000000) >> 24), (byte)((higher & 0x0000000000FF0000) >> 16), (byte)((higher & 0x000000000000FF00) >> 8), (byte)((higher & 0x00000000000000FF)), (byte)((lower & 0xFF00000000000000) >> 56), (byte)((lower & 0x00FF000000000000) >> 48), (byte)((lower & 0x0000FF0000000000) >> 40), (byte)((lower & 0x000000FF00000000) >> 32), (byte)((lower & 0x00000000FF000000) >> 24), (byte)((lower & 0x0000000000FF0000) >> 16), (byte)((lower & 0x000000000000FF00) >> 8), (byte)((lower & 0x00000000000000FF)), }; return new Guid(byteArray); } /// <summary> /// Encodes binary data into a plaintext Ascii-85 format string. /// </summary> /// <param name="guid">The Guid to encode.</param> /// <returns>Ascii-85 encoded string</returns> public static string Encode(Guid guid) { // Convert the 128-bit Guid into two 64-bit parts. var byteArray = guid.ToByteArray(); var higher = ((UInt64)byteArray[0] << 56) | ((UInt64)byteArray[1] << 48) | ((UInt64)byteArray[2] << 40) | ((UInt64)byteArray[3] << 32) | ((UInt64)byteArray[4] << 24) | ((UInt64)byteArray[5] << 16) | ((UInt64)byteArray[6] << 8) | byteArray[7]; var lower = ((UInt64)byteArray[ 8] << 56) | ((UInt64)byteArray[ 9] << 48) | ((UInt64)byteArray[10] << 40) | ((UInt64)byteArray[11] << 32) | ((UInt64)byteArray[12] << 24) | ((UInt64)byteArray[13] << 16) | ((UInt64)byteArray[14] << 8) | byteArray[15]; var encodedStringBuilder = new StringBuilder(); // Encode each part into an ascii-85 encoded string. encodedStringBuilder.AsciiEncode(higher); encodedStringBuilder.AsciiEncode(lower); return encodedStringBuilder.ToString(); } /// <summary> /// Encodes the given integer using Ascii-85. /// </summary> /// <param name="encodedStringBuilder">The <see cref="StringBuilder"/> to /// append the results to.</param> /// <param name="part">The integer to encode.</param> private static void AsciiEncode( this StringBuilder encodedStringBuilder, UInt64 part) { // Nb, the most significant digits in our encoded character will // be the right-most characters. var charCount = (UInt32)kEncodeMap.Length; // Ascii-85 can encode 4 bytes of binary data into 5 bytes of Ascii. // Since a UInt64 is 8 bytes long, the Ascii-85 encoding should be // 10 characters long. for (var i = 0; i < 10; i++) { // Get the remainder when dividing by the base. var remainder = part % charCount; // Divide by the base. part /= charCount; // Add the appropriate character for the current value (0-84). encodedStringBuilder.Append(kEncodeMap[remainder]); } } /// <summary> /// Decodes the given string from Ascii-85 to an integer. /// </summary> /// <param name="ascii85EncodedString">Decodes a 10 character Ascii-85 /// encoded string.</param> /// <returns>The integer representation of the parameter.</returns> private static UInt64 AsciiDecode(this string ascii85EncodedString) { if (ascii85EncodedString.Length != 10) { throw new ArgumentException( "An Ascii-85 encoded Uint64 should be 10 characters long.", "ascii85EncodedString"); } // Nb, the most significant digits in our encoded character // will be the right-most characters. var charCount = (UInt32)kEncodeMap.Length; UInt64 result = 0; // Starting with the right-most (most-significant) character, // iterate through the encoded string and decode. for (var i = ascii85EncodedString.Length - 1; i >= 0; i--) { // Multiply the current decoded value by the base. result *= charCount; // Add the integer value for that encoded character. result += kDecodeMap[ascii85EncodedString[i]]; } return result; } }

Also, here are the unit tests. They aren't as thorough as I'd like, and I don't like the non-determinism of where Guid.NewGuid() is used, but they should get you started:

I hope this saves somebody some trouble.

Also, if you find any bugs then let me know ;-)

叫嚣ゝ 2024-09-08 10:02:49

使用 85 基数。
参见第 4.1 节。 为什么是 85? IPv6 地址的紧凑表示

IPv6 地址(如 GUID)由 8 个 16 位片段组成。

Use Base 85.
See section 4.1. Why 85? of A Compact Representation of IPv6 Addresses

An IPv6 address, like a GUID is made up of eight 16-bit pieces.

简单气质女生网名 2024-09-08 10:02:49

您有 95 个可用字符 - 因此,多于 6 位,但少于 7 位(实际上约为 6.57 位)。您可以使用 128/log2(95) = 大约 19.48 个字符来编码为 20 个字符。如果以编码形式保存 2 个字符值得您损失可读性,则类似于(伪代码):

char encoded[21];
long long guid;    // 128 bits number

for(int i=0; i<20; ++i) {
  encoded[i] = chr(guid % 95 + 33);
  guid /= 95;
}
encoded[20] = chr(0);

这基本上是通用的“以某种基数编码数字”代码,只不过不需要反转“数字”,因为无论如何,顺序是任意的(小尾数法更直接、更自然)。要从编码字符串中获取 guid,以非常相似的方式进行以 95 为底的多项式计算(当然是在从每个数字中减去 33 之后):

guid = 0;

for(int i=0; i<20; ++i) {
  guid *= 95;
  guid += ord(encoded[i]) - 33;
}

本质上是使用 Horner 的多项式计算方法。

You have 95 characters available -- so, more than 6 bits, but not quite as many as 7 (about 6.57 actually). You could use 128/log2(95) = about 19.48 characters, to encode into 20 characters. If saving 2 characters in the encoded form is worth the loss of readability to you, something like (pseudocode):

char encoded[21];
long long guid;    // 128 bits number

for(int i=0; i<20; ++i) {
  encoded[i] = chr(guid % 95 + 33);
  guid /= 95;
}
encoded[20] = chr(0);

which is basically the generic "encode a number in some base" code, except that there's no need to reverse the "digits" since the order's arbitrary anyway (and little-endian is more direct and natural). To get back the guid from the encoded string is, in a very similar way, the polynomial computation in base 95 (after subtracting 33 from each digit of course):

guid = 0;

for(int i=0; i<20; ++i) {
  guid *= 95;
  guid += ord(encoded[i]) - 33;
}

essentially using Horner's approach to polynomial evaluation.

一念一轮回 2024-09-08 10:02:49

只需转到Base64

Simply go Base64.

[浮城] 2024-09-08 10:02:49

使用从 33(顺便说一句,空格有什么问题吗?)到 127 的完整范围,可以得到 95 个可能的字符。以 95 为基数表示 guid 的 2^128 可能值将使用 20 个字符。这是你能做的最好的事情(模数的事情,比如丢弃恒定的半字节)。省去麻烦 - 使用 base 64。

Using the full range from 33 (what's wrong wirh space, incidentally?) to 127 gives you 95 possible characters. Expressing the 2^128 possible values of guid in base 95 will use 20 characters. This (modulo things like dropping nybbles that will be constant) is the best you can do. Save yourself the trouble - use base 64.

近箐 2024-09-08 10:02:49

假设您的所有 GUID 均由相同算法生成,则在应用任何其他编码之前,您可以通过不对算法半字节进行编码来节省 4 位:-|

Assuming that all of your GUIDs are being generated by the same algorithm, you can save 4 bits by not encoding the algorithm nibble, before applying any other encoding :-|

惜醉颜 2024-09-08 10:02:49

任意 GUID? “朴素”的算法将产生最佳结果。进一步压缩 GUID 的唯一方法是利用“任意”约束排除的数据中的模式。

An arbitrary GUID? The "naive" algorithm will produce optimal results. The only way to compress a GUID further is to make use of patterns in the data excluded by your "arbitrary" constraint.

那些过往 2024-09-08 10:02:49

我同意 Base64 的方法。它将把 32 个字母的 UUID 缩减为 22 个字母的 Base64。

这是简单的十六进制 <-> PHP 的 Base64 转换函数:

function hex_to_base64($hex){
  $return = '';
  foreach(str_split($hex, 2) as $pair){
    $return .= chr(hexdec($pair));
  }
  return preg_replace("/=+$/", "", base64_encode($return)); // remove the trailing = sign, not needed for decoding in PHP.
}

function base64_to_hex($base64) {
  $return = '';
  foreach (str_split(base64_decode($base64), 1) as $char) {
      $return .= str_pad(dechex(ord($char)), 2, "0", STR_PAD_LEFT);
  }
  return $return;
}

I agree with the Base64 approach. It will cut back a 32-letter UUID to 22-letter Base64.

Here are simple Hex <-> Base64 converting functions for PHP:

function hex_to_base64($hex){
  $return = '';
  foreach(str_split($hex, 2) as $pair){
    $return .= chr(hexdec($pair));
  }
  return preg_replace("/=+$/", "", base64_encode($return)); // remove the trailing = sign, not needed for decoding in PHP.
}

function base64_to_hex($base64) {
  $return = '';
  foreach (str_split(base64_decode($base64), 1) as $char) {
      $return .= str_pad(dechex(ord($char)), 2, "0", STR_PAD_LEFT);
  }
  return $return;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文