将字节数组转换为任意基数
我有一个字节数组(任意长度),我想使用我自己的基本编码器将此数组编码为字符串。在 .NET
中是标准的 Base64
编码器,但是如果我想用 Base62
、Base53
或Base13
?
是否有可能创建这样的通用基础编码器?
我知道我可以用简单的方法做到这一点,即为每个字节保留固定数量的字符(在Base62
的情况下,这将是5个字符),并进行直接字节->字符编码,但我会浪费空间,因为 5 个 Base62 字符能够包含超过 1 个字节,但少于 2 个字节。
我应该如何编写这样的编码器?或者已经有一些课程可以做到这一点?
请注意,我还需要通用解码器,否则这对我来说毫无用处。
资源
由于该解决方案是已知的(使用 BigInteger),我只想在这里放置一些与 BigInteger 类相关的资源,因为它在 .NET 3.5 中不可用:
C# 中的大整数
http://intx.codeplex.com/
https:/ /svn.apache.org/repos/asf/incubator/heraldry/libraries/csharp/openid/trunk/Mono/Mono.Math/BigInteger.cs
http://www.codeproject.com/KB/cs/BigInteger_Library.aspx
http://www.codeproject.com/KB/cs/biginteger.aspx
I have an array of bytes (any length), and I want to encode this array into string using my own base encoder. In .NET
is standard Base64
encoder, but what if I want to encode the array in Base62
, Base53
or Base13
?
Is it even possible to create such universal base encoder?
I know I could do it the simple way, that is, for each byte reserve fixed number of chars (in case of Base62
, that would be 5 chars), and do direct byte->chars encoding, but I would be wasting space, as 5 Base62
chars are able to contain more than 1 byte, but less than 2 bytes.
How should I write such an encoder? Or is there already some class for this?
And please note that I need universal decoder as well, otherwise this is useless to me.
Resources
As the solution is already known (use BigInteger
), I would just like to put here some resources relating the BigInteger
class, as it is not available in .NET 3.5:
Big integers in C#
http://intx.codeplex.com/
https://svn.apache.org/repos/asf/incubator/heraldry/libraries/csharp/openid/trunk/Mono/Mono.Math/BigInteger.cs
http://www.codeproject.com/KB/cs/BigInteger_Library.aspx
http://www.codeproject.com/KB/cs/biginteger.aspx
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
有点晚了,但是......
因为您的规范要求任意数量的位,所以您必须有一个可以使用任意数量的位的整数类型。如果您不能以 .NET 4.0 为目标,您将不得不在某个地方乞求、借用或窃取 BigInteger 实现(也许是 .NET 4.0)。
A little late to the party, but...
Because your specification calls for an arbitrary number of bits, you must have an integer type that can work with an arbitrary number of bits. If you can't target .NET 4.0 you'll have to beg, borrow, or steal a BigInteger implementation somewhere (like .NET 4.0 perhaps).
如果性能不是问题,请使用 BigInteger类在后台。您有一个采用字节数组的 BigInteger 构造函数,然后您可以手动运行除法和模数循环来获取其他非标准基数的表示形式。
另请查看此。
If performance is not an issue, use the BigInteger class in the background. You have a constructor for BigInteger that takes byte array, and you can then manually run loops of division and modulus to get the representation in other non-standard bases.
Also take a look at this.
这是我的 博客 的副本,我希望对您有所帮助(以及为什么)我转换为 Base62
我目前正在开发自己的 url 缩短器:konv.es。为了创建尽可能短的 url 字符哈希,我使用字符串的 GetHashCode() 方法,然后将结果数字转换为基数 62 ([0-9a-zA-Z])。迄今为止我发现的进行转换的最优雅的解决方案(这也是收益率回报的一个方便的花花公子的例子)是:
额外的信用:重构作为扩展方法
Here is a copy from my blog which I hope helps how (and why) I convert to Base62
I am currently working on my own url shortener: konv.es. In order to create the shortest possible character hash of the url, I use the GetHashCode() method of the string, then convert the resulting number to base 62 ([0-9a-zA-Z]). The most elegant solution that I have found thus far to make the convertion (which is also a handy-dandy example of a yield return) is:
Extra credit: re-factor as an extension method
BASE64 效果很好,因为 64 是 2 的幂 (2^6),因此每个字符保存 6 位数据,3 个字节 (3 * 8 = 24 位) 可以编码为 4 个字符 (4 * 6 = 24)。编码&解码可以仅向下位移位。
对于不与 2 的幂对齐的基数(例如基数 62 或基数 53),则必须将尝试编码的消息视为一个长数,并对其执行除法和模运算。您可能最好使用 Base32 编码并浪费一些带宽。
BASE64 works well, because 64 is a power of 2 (2^6) so each character holds 6 bits of data, and 3 bytes (3 * 8 = 24 bits) can be encoded into 4 characters (4 * 6 = 24). The encoding & decoding can be down merely bit shifting bits.
For bases which do not align with a power of 2 (like your base 62 or Base 53), Then you must treat the message you are trying to encode as one long number and perform divison and modulo operations on it. You'd probably be better off using a Base32 encoding and squandering a bit of bandwidth.
您可以从 Base32 由 Michael Giagnocavo 实施。
You can get inspiration from C# implementation of Base32 implementation by Michael Giagnocavo.
受到 Steve Konves 的回答的启发
Inspired by Steve Konves's answer
另一个值得关注的示例是 Ascii85,用于 Adobe PostScript 和 PDF 文档。在Ascii85中,5个字符用于编码4个字节。您可以算出此编码的效率为 (256^4)/(85^5) = 96.8%。这是实际使用的位组合的分数。
因此,无论您想要使用什么新基数来编码数据,如果您想最大限度地提高编码效率,您都需要寻找一种能够使其略高于 256 次方的幂。对于每个基地来说这可能并不容易。检查基数 53 表明,您可能得到的最好结果是使用 7 个字节来编码 5 个字节(效率为 93.6%),除非您想使用 88 个字节来编码 63 个字节。
Another example to look at is Ascii85, used in Adobe PostScript and PDF documents. In Ascii85, 5 characters are used to encode 4 bytes. You can figure out the efficiency of this coding as (256^4)/(85^5) = 96.8%. This is the fraction of bit combinations that will actually be used.
So, for whatever new base you would want to use to encode your data, you want to look for a power that will get it just above a power of 256 if you're trying to maximize coding efficiency. This might not be easy for every base. Checking base 53 shows that the best you'll probably get is using 7 bytes to encode 5 bytes (93.6% efficiency), unless you feel like using 88 bytes to encode 63 bytes.
我写了一篇文章< /a> 它描述了 Python 中的一个解决方案,可以完全解决您的问题。我没有使用 Python 的非常特殊的功能来获得可以轻松用其他语言实现的解决方案。您可能会看一下并确定它是否符合您的需求。
I've written an article which describes a solution in Python that exactly deals with your problem. I didn't use very special features of Python in order to get a solution which can easily be implemented in other languages. You might have a look and find out if it fits your needs.
CodeReview 上的帖子提示我创建一个 RadixEncoding 类它能够处理字节数组与基数 N 字符串的编码/解码。
该类可以在此问答线程中找到,以及有关处理 BigInteger、字节序支持和类整体性能时的一些边缘情况(及其解决方案)的文档
A post on CodeReview prompted me to create a RadixEncoding class which is able to handle encoding/decoding a byte array to/from a base-N string.
The class can be found in this Q&A thread, along with documentation on (and solutions for) a few edge cases when dealing with BigInteger, endian-ness support, and the class' overall performance
以下是将字节数组转换为 Base64 的示例代码片段。关于此有一篇非常好的文章,我参考了 这个。
Here is the sample code snippet to convert byte array to base64. There is a very good article on this, I took reference from this.