Base64的编码与解码 - Web API 接口参考 编辑
Base64 是一组相似的二进制到文本(binary-to-text)的编码规则,使得二进制数据在解释成 radix-64 的表现形式后能够用 ASCII 字符串的格式表示出来。Base64 这个词出自一种 MIME 数据传输编码。
Base64编码普遍应用于需要通过被设计为处理文本数据的媒介上储存和传输二进制数据而需要编码该二进制数据的场景。这样是为了保证数据的完整并且不用在传输过程中修改这些数据。Base64 也被一些应用(包括使用 MIME 的电子邮件)和在 XML 中储存复杂数据时使用。
在 JavaScript 中,有两个函数被分别用来处理解码和编码 base64 字符串:
atob()
函数能够解码通过base-64编码的字符串数据。相反地,btoa()
函数能够从二进制数据“字符串”创建一个base-64编码的ASCII字符串。
atob()
和 btoa()
均使用字符串。如果你想使用 ArrayBuffers
,请参阅后文。
编码尺寸增加
每一个Base64字符实际上代表着6比特位。因此,3字节(一字节是8比特,3字节也就是24比特)的字符串/二进制文件可以转换成4个Base64字符(4x6 = 24比特)。
这意味着Base64格式的字符串或文件的尺寸约是原始尺寸的133%(增加了大约33%)。如果编码的数据很少,增加的比例可能会更高。例如:字符串"a"
的length === 1
进行Base64编码后是"YQ=="
的length === 4
,尺寸增加了300%。
文档
| 工具
相关文章 |
Unicode 问题
由于 DOMString
是16位编码的字符串,所以如果有字符超出了8位ASCII编码的字符范围时,在大多数的浏览器中对Unicode字符串调用 window.btoa
将会造成一个 Character Out Of Range
的异常。有很多种方法可以解决这个问题:
- the first method consists in encoding JavaScript's native UTF-16 strings directly into base64 (fast, portable, clean)
- the second method consists in converting JavaScript's native UTF-16 strings to UTF-8 and then encode the latter into base64 (relatively fast, portable, clean)
- the third method consists in encoding JavaScript's native UTF-16 strings directly into base64 via binary strings (very fast, relatively portable, very compact)
- the fourth method consists in escaping the whole string (with UTF-8, see
encodeURIComponent
) and then encode it (portable, non-standard) - the fifth method is similar to the second method, but uses third party libraries
Solution #1 – JavaScript's UTF-16 => base64
A very fast and widely useable way to solve the unicode problem is by encoding JavaScript native UTF-16 strings directly into base64. Please visit the URL data:text/plain;charset=utf-16;base64,OCY5JjomOyY8Jj4mPyY=
for a demonstration (copy the data uri, open a new tab, paste the data URI into the address bar, then press enter to go to the page). This method is particularly efficient because it does not require any type of conversion, except mapping a string into an array. The following code is also useful to get an ArrayBuffer from a Base64 string and/or viceversa (see below).
"use strict";
/*\
|*|
|*| Base64 / binary data / UTF-8 strings utilities (#1)
|*|
|*| /wiki/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding
|*|
|*| Author: madmurphy
|*|
\*/
/* Array of bytes to base64 string decoding */
function b64ToUint6 (nChr) {
return nChr > 64 && nChr < 91 ?
nChr - 65
: nChr > 96 && nChr < 123 ?
nChr - 71
: nChr > 47 && nChr < 58 ?
nChr + 4
: nChr === 43 ?
62
: nChr === 47 ?
63
:
0;
}
function base64DecToArr (sBase64, nBlockSize) {
var
sB64Enc = sBase64.replace(/[^A-Za-z0-9\+\/]/g, ""), nInLen = sB64Enc.length,
nOutLen = nBlockSize ? Math.ceil((nInLen * 3 + 1 >>> 2) / nBlockSize) * nBlockSize : nInLen * 3 + 1 >>> 2, aBytes = new Uint8Array(nOutLen);
for (var nMod3, nMod4, nUint24 = 0, nOutIdx = 0, nInIdx = 0; nInIdx < nInLen; nInIdx++) {
nMod4 = nInIdx & 3;
nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << 18 - 6 * nMod4;
if (nMod4 === 3 || nInLen - nInIdx === 1) {
for (nMod3 = 0; nMod3 < 3 && nOutIdx < nOutLen; nMod3++, nOutIdx++) {
aBytes[nOutIdx] = nUint24 >>> (16 >>> nMod3 & 24) & 255;
}
nUint24 = 0;
}
}
return aBytes;
}
/* Base64 string to array encoding */
function uint6ToB64 (nUint6) {
return nUint6 < 26 ?
nUint6 + 65
: nUint6 < 52 ?
nUint6 + 71
: nUint6 < 62 ?
nUint6 - 4
: nUint6 === 62 ?
43
: nUint6 === 63 ?
47
:
65;
}
function base64EncArr (aBytes) {
var eqLen = (3 - (aBytes.length % 3)) % 3, sB64Enc = "";
for (var nMod3, nLen = aBytes.length, nUint24 = 0, nIdx = 0; nIdx < nLen; nIdx++) {
nMod3 = nIdx % 3;
/* Uncomment the following line in order to split the output in lines 76-character long: */
/*
if (nIdx > 0 && (nIdx * 4 / 3) % 76 === 0) { sB64Enc += "\r\n"; }
*/
nUint24 |= aBytes[nIdx] << (16 >>> nMod3 & 24);
if (nMod3 === 2 || aBytes.length - nIdx === 1) {
sB64Enc += String.fromCharCode(uint6ToB64(nUint24 >>> 18 & 63), uint6ToB64(nUint24 >>> 12 & 63), uint6ToB64(nUint24 >>> 6 & 63), uint6ToB64(nUint24 & 63));
nUint24 = 0;
}
}
return eqLen === 0 ?
sB64Enc
:
sB64Enc.substring(0, sB64Enc.length - eqLen) + (eqLen === 1 ? "=" : "==");
}
Tests
var myString = "☸☹☺☻☼☾☿";
/* Part 1: Encode `myString` to base64 using native UTF-16 */
var aUTF16CodeUnits = new Uint16Array(myString.length);
Array.prototype.forEach.call(aUTF16CodeUnits, function (el, idx, arr) { arr[idx] = myString.charCodeAt(idx); });
var sUTF16Base64 = base64EncArr(new Uint8Array(aUTF16CodeUnits.buffer));
/* Show output */
alert(sUTF16Base64); // "OCY5JjomOyY8Jj4mPyY="
/* Part 2: Decode `sUTF16Base64` to UTF-16 */
var sDecodedString = String.fromCharCode.apply(null, new Uint16Array(base64DecToArr(sUTF16Base64, 2).buffer));
/* Show output */
alert(sDecodedString); // "☸☹☺☻☼☾☿"
The produced string is fully portable, although represented as UTF-16. If you prefer UTF-8, see the next solution.
Appendix to Solution #1: Decode a Base64 string to Uint8Array or ArrayBuffer
The functions above let us also create uint8Arrays or arrayBuffers from base64-encoded strings:
var myArray = base64DecToArr("QmFzZSA2NCDigJQgTW96aWxsYSBEZXZlbG9wZXIgTmV0d29yaw=="); // "Base 64 \u2014 Mozilla Developer Network" (as UTF-8)
var myBuffer = base64DecToArr("QmFzZSA2NCDigJQgTW96aWxsYSBEZXZlbG9wZXIgTmV0d29yaw==").buffer; // "Base 64 \u2014 Mozilla Developer Network" (as UTF-8)
alert(myBuffer.byteLength);
Note: The function base64DecToArr(sBase64[, nBlockSize])
returns an uint8Array
of bytes. If your aim is to build a buffer of 16-bit / 32-bit / 64-bit raw data, use the nBlockSize
argument, which is the number of bytes which the uint8Array.buffer.bytesLength
property must result to be a multiple of (1
or omitted for ASCII, binary content, binary strings, UTF-8-encoded strings; 2
for UTF-16 strings; 4
for UTF-32 strings).For a more complete library, see StringView
– a C-like representation of strings based on typed arrays (source code available on GitHub).
Solution #2 – JavaScript's UTF-16 => UTF-8 => base64
This solution consists in converting a JavaScript's native UTF-16 string into a UTF-8 string and then encoding the latter into base64. This also grants that converting a pure ASCII string to base64 always produces the same output as the native btoa()
.
"use strict";
/*\
|*|
|*| Base64 / binary data / UTF-8 strings utilities (#2)
|*|
|*| /wiki/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding
|*|
|*| Author: madmurphy
|*|
\*/
/* Array of bytes to base64 string decoding */
function b64ToUint6 (nChr) {
return nChr > 64 && nChr < 91 ?
nChr - 65
: nChr > 96 && nChr < 123 ?
nChr - 71
: nChr > 47 && nChr < 58 ?
nChr + 4
: nChr === 43 ?
62
: nChr === 47 ?
63
:
0;
}
function base64DecToArr (sBase64, nBlockSize) {
var
sB64Enc = sBase64.replace(/[^A-Za-z0-9\+\/]/g, ""), nInLen = sB64Enc.length,
nOutLen = nBlockSize ? Math.ceil((nInLen * 3 + 1 >>> 2) / nBlockSize) * nBlockSize : nInLen * 3 + 1 >>> 2, aBytes = new Uint8Array(nOutLen);
for (var nMod3, nMod4, nUint24 = 0, nOutIdx = 0, nInIdx = 0; nInIdx < nInLen; nInIdx++) {
nMod4 = nInIdx & 3;
nUint24 |= b64ToUint6(sB64Enc.charCodeAt(nInIdx)) << 18 - 6 * nMod4;
if (nMod4 === 3 || nInLen - nInIdx === 1) {
for (nMod3 = 0; nMod3 < 3 && nOutIdx < nOutLen; nMod3++, nOutIdx++) {
aBytes[nOutIdx] = nUint24 >>> (16 >>> nMod3 & 24) & 255;
}
nUint24 = 0;
}
}
return aBytes;
}
/* Base64 string to array encoding */
function uint6ToB64 (nUint6) {
return nUint6 < 26 ?
nUint6 + 65
: nUint6 < 52 ?
nUint6 + 71
: nUint6 < 62 ?
nUint6 - 4
: nUint6 === 62 ?
43
: nUint6 === 63 ?
47
:
65;
}
function base64EncArr (aBytes) {
var eqLen = (3 - (aBytes.length % 3)) % 3, sB64Enc = "";
for (var nMod3, nLen = aBytes.length, nUint24 = 0, nIdx = 0; nIdx < nLen; nIdx++) {
nMod3 = nIdx % 3;
/* Uncomment the following line in order to split the output in lines 76-character long: */
/*
if (nIdx > 0 && (nIdx * 4 / 3) % 76 === 0) { sB64Enc += "\r\n"; }
*/
nUint24 |= aBytes[nIdx] << (16 >>> nMod3 & 24);
if (nMod3 === 2 || aBytes.length - nIdx === 1) {
sB64Enc += String.fromCharCode(uint6ToB64(nUint24 >>> 18 & 63), uint6ToB64(nUint24 >>> 12 & 63), uint6ToB64(nUint24 >>> 6 & 63), uint6ToB64(nUint24 & 63));
nUint24 = 0;
}
}
return eqLen === 0 ?
sB64Enc
:
sB64Enc.substring(0, sB64Enc.length - eqLen) + (eqLen === 1 ? "=" : "==");
}
/* UTF-8 array to DOMString and vice versa */
function UTF8ArrToStr (aBytes) {
var sView = "";
for (var nPart, nLen = aBytes.length, nIdx = 0; nIdx < nLen; nIdx++) {
nPart = aBytes[nIdx];
sView += String.fromCharCode(
nPart > 251 && nPart < 254 && nIdx + 5 < nLen ? /* six bytes */
/* (nPart - 252 << 30) may be not so safe in ECMAScript! So...: */
(nPart - 252) * 1073741824 + (aBytes[++nIdx] - 128 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 247 && nPart < 252 && nIdx + 4 < nLen ? /* five bytes */
(nPart - 248 << 24) + (aBytes[++nIdx] - 128 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 239 && nPart < 248 && nIdx + 3 < nLen ? /* four bytes */
(nPart - 240 << 18) + (aBytes[++nIdx] - 128 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 223 && nPart < 240 && nIdx + 2 < nLen ? /* three bytes */
(nPart - 224 << 12) + (aBytes[++nIdx] - 128 << 6) + aBytes[++nIdx] - 128
: nPart > 191 && nPart < 224 && nIdx + 1 < nLen ? /* two bytes */
(nPart - 192 << 6) + aBytes[++nIdx] - 128
: /* nPart < 127 ? */ /* one byte */
nPart
);
}
return sView;
}
function strToUTF8Arr (sDOMStr) {
var aBytes, nChr, nStrLen = sDOMStr.length, nArrLen = 0;
/* mapping... */
for (var nMapIdx = 0; nMapIdx < nStrLen; nMapIdx++) {
nChr = sDOMStr.charCodeAt(nMapIdx);
nArrLen += nChr < 0x80 ? 1 : nChr < 0x800 ? 2 : nChr < 0x10000 ? 3 : nChr < 0x200000 ? 4 : nChr < 0x4000000 ? 5 : 6;
}
aBytes = new Uint8Array(nArrLen);
/* transcription... */
for (var nIdx = 0, nChrIdx = 0; nIdx < nArrLen; nChrIdx++) {
nChr = sDOMStr.charCodeAt(nChrIdx);
if (nChr < 128) {
/* one byte */
aBytes[nIdx++] = nChr;
} else if (nChr < 0x800) {
/* two bytes */
aBytes[nIdx++] = 192 + (nChr >>> 6);
aBytes[nIdx++] = 128 + (nChr & 63);
} else if (nChr < 0x10000) {
/* three bytes */
aBytes[nIdx++] = 224 + (nChr >>> 12);
aBytes[nIdx++] = 128 + (nChr >>> 6 & 63);
aBytes[nIdx++] = 128 + (nChr & 63);
} else if (nChr < 0x200000) {
/* four bytes */
aBytes[nIdx++] = 240 + (nChr >>> 18);
aBytes[nIdx++] = 128 + (nChr >>> 12 & 63);
aBytes[nIdx++] = 128 + (nChr >>> 6 & 63);
aBytes[nIdx++] = 128 + (nChr & 63);
} else if (nChr < 0x4000000) {
/* five bytes */
aBytes[nIdx++] = 248 + (nChr >>> 24);
aBytes[nIdx++] = 128 + (nChr >>> 18 & 63);
aBytes[nIdx++] = 128 + (nChr >>> 12 & 63);
aBytes[nIdx++] = 128 + (nChr >>> 6 & 63);
aBytes[nIdx++] = 128 + (nChr & 63);
} else /* if (nChr <= 0x7fffffff) */ {
/* six bytes */
aBytes[nIdx++] = 252 + (nChr >>> 30);
aBytes[nIdx++] = 128 + (nChr >>> 24 & 63);
aBytes[nIdx++] = 128 + (nChr >>> 18 & 63);
aBytes[nIdx++] = 128 + (nChr >>> 12 & 63);
aBytes[nIdx++] = 128 + (nChr >>> 6 & 63);
aBytes[nIdx++] = 128 + (nChr & 63);
}
}
return aBytes;
}
Tests
/* Tests */
var sMyInput = "Base 64 \u2014 Mozilla Developer Network";
var aMyUTF8Input = strToUTF8Arr(sMyInput);
var sMyBase64 = base64EncArr(aMyUTF8Input);
alert(sMyBase64); // "QmFzZSA2NCDigJQgTW96aWxsYSBEZXZlbG9wZXIgTmV0d29yaw=="
var aMyUTF8Output = base64DecToArr(sMyBase64);
var sMyOutput = UTF8ArrToStr(aMyUTF8Output);
alert(sMyOutput); // "Base 64 — Mozilla Developer Network"
Solution #3 – JavaScript's UTF-16 => binary string => base64
The following is the fastest and most compact possible approach. The output is exactly the same produced by Solution #1 (UTF-16 encoded strings), but instead of rewriting atob()
and btoa()
it uses the native ones. This is made possible by the fact that instead of using typed arrays as encoding/decoding inputs this solution uses binary strings as an intermediate format. It is a “dirty” workaround in comparison to Solution #1 (binary strings are a grey area), however it works pretty well and requires only a few lines of code.
"use strict";
/*\
|*|
|*| Base64 / binary data / UTF-8 strings utilities (#3)
|*|
|*| /wiki/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding
|*|
|*| Author: madmurphy
|*|
\*/
function btoaUTF16 (sString) {
var aUTF16CodeUnits = new Uint16Array(sString.length);
Array.prototype.forEach.call(aUTF16CodeUnits, function (el, idx, arr) { arr[idx] = sString.charCodeAt(idx); });
return btoa(String.fromCharCode.apply(null, new Uint8Array(aUTF16CodeUnits.buffer)));
}
function atobUTF16 (sBase64) {
var sBinaryString = atob(sBase64), aBinaryView = new Uint8Array(sBinaryString.length);
Array.prototype.forEach.call(aBinaryView, function (el, idx, arr) { arr[idx] = sBinaryString.charCodeAt(idx); });
return String.fromCharCode.apply(null, new Uint16Array(aBinaryView.buffer));
}
Tests
var myString = "☸☹☺☻☼☾☿";
/* Part 1: Encode `myString` to base64 using native UTF-16 */
var sUTF16Base64 = btoaUTF16(myString);
/* Show output */
alert(sUTF16Base64); // "OCY5JjomOyY8Jj4mPyY="
/* Part 2: Decode `sUTF16Base64` to UTF-16 */
var sDecodedString = atobUTF16(sUTF16Base64);
/* Show output */
alert(sDecodedString); // "☸☹☺☻☼☾☿"
For a cleaner solution that uses typed arrays instead of binary strings, see solutions #1 and #2.
Solution #4 – escaping the string before encoding it
function b64EncodeUnicode(str) {
// first we use encodeURIComponent to get percent-encoded UTF-8,
// then we convert the percent encodings into raw bytes which
// can be fed into btoa.
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
function toSolidBytes(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="
To decode the Base64-encoded value back into a String:
function b64DecodeUnicode(str) {
// Going backwards: from bytestream, to percent-encoding, to original string.
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
Unibabel implements common conversions using this strategy.
Solution #5 – rewrite the DOMs atob() and btoa() using JavaScript's TypedArrays and UTF-8
Use a TextEncoder polyfill such as TextEncoding (also includes legacy windows, mac, and ISO encodings), TextEncoderLite, combined with a Buffer and a Base64 implementation such as base64-js or TypeScript version of base64-js for both modern browsers and Node.js.
When a native TextEncoder
implementation is not available, the most light-weight solution would be to use Solution #3 because in addition to being much faster, Solution #3 also works in IE9 "out of the box." Alternatively, use TextEncoderLite with base64-js. Use the browser implementation when you can.
The following function implements such a strategy. It assumes base64-js imported as <script type="text/javascript" src="base64js.min.js"/>
. Note that TextEncoderLite only works with UTF-8.
function Base64Encode(str, encoding = 'utf-8') {
var bytes = new (typeof TextEncoder === "undefined" ? TextEncoderLite : TextEncoder)(encoding).encode(str);
return base64js.fromByteArray(bytes);
}
function Base64Decode(str, encoding = 'utf-8') {
var bytes = base64js.toByteArray(str);
return new (typeof TextDecoder === "undefined" ? TextDecoderLite : TextDecoder)(encoding).decode(bytes);
}
注意: TextEncoderLite 不能正确处理四字节 UTF-8 字符, 比如 '\uD842\uDFB7' 或缩写为 '\u{20BB7}' 。参见 issue
可使用 text-encoding 作为替代。
某些场景下,以上经由 UTF-8 转换到 Base64 的实现在空间利用上不一定高效。当处理包含大量 U+0800-U+FFFF 区域间字符的文本时, UTF-8 输出结果长于 UTF-16 的,因为这些字符在 UTF-8 下占用三个字节而 UTF-16 是两个。在处理均匀分布 UTF-16 码点的 JavaScript 字符串时应考虑采用 UTF-16 替代 UTF-8 作为 Base64 结果的中间编码格式,这将减少 40% 尺寸。
译者注:下为陈旧翻译片段
- 第一个是转义(escape)整个字符串然后编码这个它;
- 第二个是把UTF-16的
DOMString
转码为UTF-8的字符数组然后编码它。
方案 #1 – 编码之前转义(escape)字符串
function b64EncodeUnicode(str) {
return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
return String.fromCharCode('0x' + p1);
}));
}
b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
把base64转换回字符串
function b64DecodeUnicode(str) {
return decodeURIComponent(atob(str).split('').map(function(c) {
return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
}).join(''));
}
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"
Unibabel 是一个包含了一些使用这种策略的通用转换的库。
方案 #6 – 用JavaScript的 TypedArray 和 UTF-8重写DOM的 atob() 和 btoa()
使用像TextEncoding(包含了早期(legacy)的windows,mac, 和 ISO 编码),TextEncoderLite 或者 Buffer 这样的文本编码器增强(polyfill)和Base64增强,比如base64-js 或 TypeScript 版本的 base64-js (适用于长青浏览器和 Node.js)。
最简单,最轻量级的解决方法就是使用 TextEncoderLite 和 base64-js.
想要更完整的库的话,参见 StringView
– a C-like representation of strings based on typed arrays.
参见
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论