将字节数组转换为任意基数

发布于 2024-09-10 08:12:24 字数 1223 浏览 5 评论 0原文

我有一个字节数组(任意长度),我想使用我自己的基本编码器将此数组编码为字符串。在 .NET 中是标准的 Base64 编码器,但是如果我想用 Base62Base53Base13

是否有可能创建这样的通用基础编码器?

我知道我可以用简单的方法做到这一点,即为每个字节保留固定数量的字符(在Base62的情况下,这将是5个字符),并进行直接字节->字符编码,但我会浪费空间,因为 5 个 Base62 字符能够包含超过 1 个字节,但少于 2 个字节。

我应该如何编写这样的编码器?或者已经有一些课程可以做到这一点?
请注意,我还需要通用解码器,否则这对我来说毫无用处。

资源

由于该解决方案是已知的(使用 BigInteger),我只想在这里放置一些与 BigInteger 类相关的资源,因为它在 .NET 3.5 中不可用:

C# 中的大整数
http://intx.codeplex.com/
https:/ /svn.apache.org/repos/asf/incubator/heraldry/libraries/csharp/openid/trunk/Mono/Mono.Math/BigInteger.cs
http://www.codeproject.com/KB/cs/BigInteger_Library.aspx
http://www.codeproject.com/KB/cs/biginteger.aspx

I have an array of bytes (any length), and I want to encode this array into string using my own base encoder. In .NET is standard Base64 encoder, but what if I want to encode the array in Base62, Base53 or Base13?

Is it even possible to create such universal base encoder?

I know I could do it the simple way, that is, for each byte reserve fixed number of chars (in case of Base62, that would be 5 chars), and do direct byte->chars encoding, but I would be wasting space, as 5 Base62 chars are able to contain more than 1 byte, but less than 2 bytes.

How should I write such an encoder? Or is there already some class for this?
And please note that I need universal decoder as well, otherwise this is useless to me.

Resources

As the solution is already known (use BigInteger), I would just like to put here some resources relating the BigInteger class, as it is not available in .NET 3.5:

Big integers in C#
http://intx.codeplex.com/
https://svn.apache.org/repos/asf/incubator/heraldry/libraries/csharp/openid/trunk/Mono/Mono.Math/BigInteger.cs
http://www.codeproject.com/KB/cs/BigInteger_Library.aspx
http://www.codeproject.com/KB/cs/biginteger.aspx

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

谜兔 2024-09-17 08:12:24

有点晚了,但是......

因为您的规范要求任意数量的位,所以您必须有一个可以使用任意数量的位的整数类型。如果您不能以 .NET 4.0 为目标,您将不得不在某个地方乞求、借用或窃取 BigInteger 实现(也许是 .NET 4.0)。

public static class GenericBaseConverter
{
    public static string ConvertToString(byte[] valueAsArray, string digits, int pad)
    {
        if (digits == null)
            throw new ArgumentNullException("digits");
        if (digits.Length < 2)
            throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");

        BigInteger value = new BigInteger(valueAsArray);
        bool isNeg = value < 0;
        value = isNeg ? -value : value;

        StringBuilder sb = new StringBuilder(pad + (isNeg ? 1 : 0));

        do
        {
            BigInteger rem;
            value = BigInteger.DivRem(value, digits.Length, out rem);
            sb.Append(digits[(int)rem]);
        } while (value > 0);

        // pad it
        if (sb.Length < pad)
            sb.Append(digits[0], pad - sb.Length);

        // if the number is negative, add the sign.
        if (isNeg)
            sb.Append('-');

        // reverse it
        for (int i = 0, j = sb.Length - 1; i < j; i++, j--)
        {
            char t = sb[i];
            sb[i] = sb[j];
            sb[j] = t;
        }

        return sb.ToString();

    }

    public static BigInteger ConvertFromString(string s, string digits)
    {
        BigInteger result;

        switch (Parse(s, digits, out result))
        {
            case ParseCode.FormatError:
                throw new FormatException("Input string was not in the correct format.");
            case ParseCode.NullString:
                throw new ArgumentNullException("s");
            case ParseCode.NullDigits:
                throw new ArgumentNullException("digits");
            case ParseCode.InsufficientDigits:
                throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");
            case ParseCode.Overflow:
                throw new OverflowException();
        }

        return result;
    }

    public static bool TryConvertFromString(string s, string digits, out BigInteger result)
    {
        return Parse(s, digits, out result) == ParseCode.Success;
    }

    private enum ParseCode
    {
        Success,
        NullString,
        NullDigits,
        InsufficientDigits,
        Overflow,
        FormatError,
    }

    private static ParseCode Parse(string s, string digits, out BigInteger result)
    {
        result = 0;

        if (s == null)
            return ParseCode.NullString;
        if (digits == null)
            return ParseCode.NullDigits;
        if (digits.Length < 2)
            return ParseCode.InsufficientDigits;

        // skip leading white space
        int i = 0;
        while (i < s.Length && Char.IsWhiteSpace(s[i]))
            ++i;
        if (i >= s.Length)
            return ParseCode.FormatError;

        // get the sign if it's there.
        BigInteger sign = 1;
        if (s[i] == '+')
            ++i;
        else if (s[i] == '-')
        {
            ++i;
            sign = -1;
        }

        // Make sure there's at least one digit
        if (i >= s.Length)
            return ParseCode.FormatError;


        // Parse the digits.
        while (i < s.Length)
        {
            int n = digits.IndexOf(s[i]);
            if (n < 0)
                return ParseCode.FormatError;
            BigInteger oldResult = result;
            result = unchecked((result * digits.Length) + n);
            if (result < oldResult)
                return ParseCode.Overflow;

            ++i;
        }

        // skip trailing white space
        while (i < s.Length && Char.IsWhiteSpace(s[i]))
            ++i;

        // and make sure there's nothing else.
        if (i < s.Length)
            return ParseCode.FormatError;

        if (sign < 0)
            result = -result;

        return ParseCode.Success;
    }
}

A little late to the party, but...

Because your specification calls for an arbitrary number of bits, you must have an integer type that can work with an arbitrary number of bits. If you can't target .NET 4.0 you'll have to beg, borrow, or steal a BigInteger implementation somewhere (like .NET 4.0 perhaps).

public static class GenericBaseConverter
{
    public static string ConvertToString(byte[] valueAsArray, string digits, int pad)
    {
        if (digits == null)
            throw new ArgumentNullException("digits");
        if (digits.Length < 2)
            throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");

        BigInteger value = new BigInteger(valueAsArray);
        bool isNeg = value < 0;
        value = isNeg ? -value : value;

        StringBuilder sb = new StringBuilder(pad + (isNeg ? 1 : 0));

        do
        {
            BigInteger rem;
            value = BigInteger.DivRem(value, digits.Length, out rem);
            sb.Append(digits[(int)rem]);
        } while (value > 0);

        // pad it
        if (sb.Length < pad)
            sb.Append(digits[0], pad - sb.Length);

        // if the number is negative, add the sign.
        if (isNeg)
            sb.Append('-');

        // reverse it
        for (int i = 0, j = sb.Length - 1; i < j; i++, j--)
        {
            char t = sb[i];
            sb[i] = sb[j];
            sb[j] = t;
        }

        return sb.ToString();

    }

    public static BigInteger ConvertFromString(string s, string digits)
    {
        BigInteger result;

        switch (Parse(s, digits, out result))
        {
            case ParseCode.FormatError:
                throw new FormatException("Input string was not in the correct format.");
            case ParseCode.NullString:
                throw new ArgumentNullException("s");
            case ParseCode.NullDigits:
                throw new ArgumentNullException("digits");
            case ParseCode.InsufficientDigits:
                throw new ArgumentOutOfRangeException("digits", "Expected string with at least two digits");
            case ParseCode.Overflow:
                throw new OverflowException();
        }

        return result;
    }

    public static bool TryConvertFromString(string s, string digits, out BigInteger result)
    {
        return Parse(s, digits, out result) == ParseCode.Success;
    }

    private enum ParseCode
    {
        Success,
        NullString,
        NullDigits,
        InsufficientDigits,
        Overflow,
        FormatError,
    }

    private static ParseCode Parse(string s, string digits, out BigInteger result)
    {
        result = 0;

        if (s == null)
            return ParseCode.NullString;
        if (digits == null)
            return ParseCode.NullDigits;
        if (digits.Length < 2)
            return ParseCode.InsufficientDigits;

        // skip leading white space
        int i = 0;
        while (i < s.Length && Char.IsWhiteSpace(s[i]))
            ++i;
        if (i >= s.Length)
            return ParseCode.FormatError;

        // get the sign if it's there.
        BigInteger sign = 1;
        if (s[i] == '+')
            ++i;
        else if (s[i] == '-')
        {
            ++i;
            sign = -1;
        }

        // Make sure there's at least one digit
        if (i >= s.Length)
            return ParseCode.FormatError;


        // Parse the digits.
        while (i < s.Length)
        {
            int n = digits.IndexOf(s[i]);
            if (n < 0)
                return ParseCode.FormatError;
            BigInteger oldResult = result;
            result = unchecked((result * digits.Length) + n);
            if (result < oldResult)
                return ParseCode.Overflow;

            ++i;
        }

        // skip trailing white space
        while (i < s.Length && Char.IsWhiteSpace(s[i]))
            ++i;

        // and make sure there's nothing else.
        if (i < s.Length)
            return ParseCode.FormatError;

        if (sign < 0)
            result = -result;

        return ParseCode.Success;
    }
}
清晨说晚安 2024-09-17 08:12:24

如果性能不是问题,请使用 BigInteger类在后台。您有一个采用字节数组的 BigInteger 构造函数,然后您可以手动运行除法和模数循环来获取其他非标准基数的表示形式。

另请查看

If performance is not an issue, use the BigInteger class in the background. You have a constructor for BigInteger that takes byte array, and you can then manually run loops of division and modulus to get the representation in other non-standard bases.

Also take a look at this.

挽手叙旧 2024-09-17 08:12:24

这是我的 博客 的副本,我希望对您有所帮助(以及为什么)我转换为 Base62

我目前正在开发自己的 url 缩短器:konv.es。为了创建尽可能短的 url 字符哈希,我使用字符串的 GetHashCode() 方法,然后将结果数字转换为基数 62 ([0-9a-zA-Z])。迄今为止我发现的进行转换的最优雅的解决方案(这也是收益率回报的一个方便的花花公子的例子)是:

public static IEnumerable<char> ToBase62(int number)
    {
        do
        {
            yield return "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"[number % 62];
            number /= 62;

        } while (number > 0);
    }

额外的信用:重构作为扩展方法

Here is a copy from my blog which I hope helps how (and why) I convert to Base62

I am currently working on my own url shortener: konv.es. In order to create the shortest possible character hash of the url, I use the GetHashCode() method of the string, then convert the resulting number to base 62 ([0-9a-zA-Z]). The most elegant solution that I have found thus far to make the convertion (which is also a handy-dandy example of a yield return) is:

public static IEnumerable<char> ToBase62(int number)
    {
        do
        {
            yield return "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"[number % 62];
            number /= 62;

        } while (number > 0);
    }

Extra credit: re-factor as an extension method

森林很绿却致人迷途 2024-09-17 08:12:24

BASE64 效果很好,因为 64 是 2 的幂 (2^6),因此每个字符保存 6 位数据,3 个字节 (3 * 8 = 24 位) 可以编码为 4 个字符 (4 * 6 = 24)。编码&解码可以仅向下位移位。

对于不与 2 的幂对齐的基数(例如基数 62 或基数 53),则必须将尝试编码的消息视为一个长数,并对其执行除法和模运算。您可能最好使用 Base32 编码并浪费一些带宽。

BASE64 works well, because 64 is a power of 2 (2^6) so each character holds 6 bits of data, and 3 bytes (3 * 8 = 24 bits) can be encoded into 4 characters (4 * 6 = 24). The encoding & decoding can be down merely bit shifting bits.

For bases which do not align with a power of 2 (like your base 62 or Base 53), Then you must treat the message you are trying to encode as one long number and perform divison and modulo operations on it. You'd probably be better off using a Base32 encoding and squandering a bit of bandwidth.

通知家属抬走 2024-09-17 08:12:24

您可以从 Base32 由 Michael Giagnocavo 实施。

You can get inspiration from C# implementation of Base32 implementation by Michael Giagnocavo.

沉睡月亮 2024-09-17 08:12:24

受到 Steve Konves 的回答的启发

using System.Numerics;

const string base62Chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
const string base26Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

static void Main() {
    string id = "xAQ0f58JgG";

    BigInteger i = fromBaseX(id, base62Chars);
    Console.WriteLine(i);

    string c = ToBaseX(i, base62Chars);
    Console.WriteLine(c);

    string c2 = ToBaseX(i, base26Chars);
    Console.WriteLine(c2);

    BigInteger i2 = fromBaseX(c2, base26Chars);
    Console.WriteLine(i2);
}

public static string ToBaseX(BigInteger number, string baseX)
{
    int l = baseX.Length;
    string result = "";
    while (number > 0)
    {
        BigInteger remainder = number % l;
        int index = (int)remainder;
        if (index >= l)
        {
            throw new ArgumentException($"Cannot convert {number} ToBaseX {baseX}");
        }
        result += baseX[index];
        number /= l;
    }
    return result;
}

public static BigInteger fromBaseX(string input, string baseX)
{
    int l = baseX.Length;
    BigInteger result;
    int pow = 0;
    foreach (char c in input)
    {
        int index = baseX.IndexOf(c);
        if (index < 0)
        {
            throw new ArgumentException($"Cannot convert {input} fromBaseX {baseX}");
        }
        BigInteger additions = BigInteger.Pow(l, pow) * index;
        result += additions;
        pow++;
    }
    return result;
}

Inspired by Steve Konves's answer

using System.Numerics;

const string base62Chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
const string base26Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

static void Main() {
    string id = "xAQ0f58JgG";

    BigInteger i = fromBaseX(id, base62Chars);
    Console.WriteLine(i);

    string c = ToBaseX(i, base62Chars);
    Console.WriteLine(c);

    string c2 = ToBaseX(i, base26Chars);
    Console.WriteLine(c2);

    BigInteger i2 = fromBaseX(c2, base26Chars);
    Console.WriteLine(i2);
}

public static string ToBaseX(BigInteger number, string baseX)
{
    int l = baseX.Length;
    string result = "";
    while (number > 0)
    {
        BigInteger remainder = number % l;
        int index = (int)remainder;
        if (index >= l)
        {
            throw new ArgumentException(
quot;Cannot convert {number} ToBaseX {baseX}");
        }
        result += baseX[index];
        number /= l;
    }
    return result;
}

public static BigInteger fromBaseX(string input, string baseX)
{
    int l = baseX.Length;
    BigInteger result;
    int pow = 0;
    foreach (char c in input)
    {
        int index = baseX.IndexOf(c);
        if (index < 0)
        {
            throw new ArgumentException(
quot;Cannot convert {input} fromBaseX {baseX}");
        }
        BigInteger additions = BigInteger.Pow(l, pow) * index;
        result += additions;
        pow++;
    }
    return result;
}
你曾走过我的故事 2024-09-17 08:12:24

另一个值得关注的示例是 Ascii85,用于 Adob​​e PostScript 和 PDF 文档。在Ascii85中,5个字符用于编码4个字节。您可以算出此编码的效率为 (256^4)/(85^5) = 96.8%。这是实际使用的位组合的分数。

因此,无论您想要使用什么新基数来编码数据,如果您想最大限度地提高编码效率,您都需要寻找一种能够使其略高于 256 次方的幂。对于每个基地来说这可能并不容易。检查基数 53 表明,您可能得到的最好结果是使用 7 个字节来编码 5 个字节(效率为 93.6%),除非您想使用 88 个字节来编码 63 个字节。

Another example to look at is Ascii85, used in Adobe PostScript and PDF documents. In Ascii85, 5 characters are used to encode 4 bytes. You can figure out the efficiency of this coding as (256^4)/(85^5) = 96.8%. This is the fraction of bit combinations that will actually be used.

So, for whatever new base you would want to use to encode your data, you want to look for a power that will get it just above a power of 256 if you're trying to maximize coding efficiency. This might not be easy for every base. Checking base 53 shows that the best you'll probably get is using 7 bytes to encode 5 bytes (93.6% efficiency), unless you feel like using 88 bytes to encode 63 bytes.

深白境迁sunset 2024-09-17 08:12:24

I've written an article which describes a solution in Python that exactly deals with your problem. I didn't use very special features of Python in order to get a solution which can easily be implemented in other languages. You might have a look and find out if it fits your needs.

孤星 2024-09-17 08:12:24

CodeReview 上的帖子提示我创建一个 RadixEncoding 类它能够处理字节数组与基数 N 字符串的编码/解码。

该类可以在此问答线程中找到,以及有关处理 BigInteger、字节序支持和类整体性能时的一些边缘情况(及其解决方案)的文档

A post on CodeReview prompted me to create a RadixEncoding class which is able to handle encoding/decoding a byte array to/from a base-N string.

The class can be found in this Q&A thread, along with documentation on (and solutions for) a few edge cases when dealing with BigInteger, endian-ness support, and the class' overall performance

无妨# 2024-09-17 08:12:24

以下是将字节数组转换为 Base64 的示例代码片段。关于此有一篇非常好的文章,我参考了 这个

public class Test {

    private static final char[] toBase64URL = {
            'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
            'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
            'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'
    };

    public static void main(String[] args) {

        byte[] mess = "ABC123".getBytes();

        byte[] masks = { -128, 64, 32, 16, 8, 4, 2, 1 };
        StringBuilder builder = new StringBuilder();

        for(int i = 0; i < mess.length; i++) {
            for (byte m : masks) {
                if ((mess[i] & m) == m) {
                    builder.append('1');
                } else {
                    builder.append('0');
                }
            }
        }

        System.out.println(builder.toString());

        int i =0;
        StringBuilder output = new StringBuilder();
        while (i < builder.length()){
            String part = builder.substring(i, i+6);
            int index = Integer.parseInt(part, 2);
            output.append(toBase64URL[index]);
            i += 6;
        }

        System.out.println(output.toString());

    }
}

Here is the sample code snippet to convert byte array to base64. There is a very good article on this, I took reference from this.

public class Test {

    private static final char[] toBase64URL = {
            'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
            'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
            'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
            'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'
    };

    public static void main(String[] args) {

        byte[] mess = "ABC123".getBytes();

        byte[] masks = { -128, 64, 32, 16, 8, 4, 2, 1 };
        StringBuilder builder = new StringBuilder();

        for(int i = 0; i < mess.length; i++) {
            for (byte m : masks) {
                if ((mess[i] & m) == m) {
                    builder.append('1');
                } else {
                    builder.append('0');
                }
            }
        }

        System.out.println(builder.toString());

        int i =0;
        StringBuilder output = new StringBuilder();
        while (i < builder.length()){
            String part = builder.substring(i, i+6);
            int index = Integer.parseInt(part, 2);
            output.append(toBase64URL[index]);
            i += 6;
        }

        System.out.println(output.toString());

    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文