使用 Javascript 检索二进制文件内容,对其进行 base64 编码并使用 Python 对其进行反向解码
我正在尝试使用 XMLHttpRequest
(使用最近的 Webkit)下载一个二进制文件,并使用这个简单的函数对其内容进行 base64 编码:
function getBinary(file){
var xhr = new XMLHttpRequest();
xhr.open("GET", file, false);
xhr.overrideMimeType("text/plain; charset=x-user-defined");
xhr.send(null);
return xhr.responseText;
}
function base64encode(binary) {
return btoa(unescape(encodeURIComponent(binary)));
}
var binary = getBinary('http://some.tld/sample.pdf');
var base64encoded = base64encode(binary);
作为旁注,上面的所有内容都是标准的 Javascript 内容,包括 btoa()
和 encodeURIComponent()
: https://developer.mozilla.org/en/DOM/window.btoa
这工作得相当顺利,我甚至可以使用 Javascript 解码 Base64 内容:
function base64decode(base64) {
return decodeURIComponent(escape(atob(base64)));
}
var decodedBinary = base64decode(base64encoded);
decodedBinary === binary // true
现在,我想使用 Python 解码 Base64 编码的内容,Python 会使用一些 JSON 字符串来获取 base64encoded
字符串值。天真地,这就是我所做的:
import urllib
import base64
# ... retrieving of base64 encoded string through JSON
base64 = "77+9UE5HDQ……………oaCgA="
source_contents = urllib.unquote(base64.b64decode(base64))
destination_file = open(destination, 'wb')
destination_file.write(source_contents)
destination_file.close()
但是生成的文件无效,看起来操作被 UTF-8、编码或我仍然不清楚的东西搞乱了。
如果我尝试在将 UTF-8 内容放入目标文件之前对其进行解码,则会出现错误:
import urllib
import base64
# ... retrieving of base64 encoded string through JSON
base64 = "77+9UE5HDQ……………oaCgA="
source_contents = urllib.unquote(base64.b64decode(base64)).decode('utf-8')
destination_file = open(destination, 'wb')
destination_file.write(source_contents)
destination_file.close()
$ python test.py
// ...
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
作为旁注,这里是同一文件的两个文本表示形式的屏幕截图;左边:原件;右边:从base64解码的字符串创建的: http://cl.ly/0U3G34110z3c132O2e2x
是否有已知的技巧来规避这些尝试重新创建文件时编码出现问题?您自己将如何实现这一目标?
非常感谢任何帮助或提示:)
I'm trying to download a binary file using XMLHttpRequest
(using a recent Webkit) and base64-encode its contents using this simple function:
function getBinary(file){
var xhr = new XMLHttpRequest();
xhr.open("GET", file, false);
xhr.overrideMimeType("text/plain; charset=x-user-defined");
xhr.send(null);
return xhr.responseText;
}
function base64encode(binary) {
return btoa(unescape(encodeURIComponent(binary)));
}
var binary = getBinary('http://some.tld/sample.pdf');
var base64encoded = base64encode(binary);
As a side note, everything above is standard Javascript stuff, including btoa()
and encodeURIComponent()
: https://developer.mozilla.org/en/DOM/window.btoa
This works pretty smoothly, and I can even decode the base64 contents using Javascript:
function base64decode(base64) {
return decodeURIComponent(escape(atob(base64)));
}
var decodedBinary = base64decode(base64encoded);
decodedBinary === binary // true
Now, I want to decode the base64-encoded contents using Python which consume some JSON string to get the base64encoded
string value. Naively this is what I do:
import urllib
import base64
# ... retrieving of base64 encoded string through JSON
base64 = "77+9UE5HDQ……………oaCgA="
source_contents = urllib.unquote(base64.b64decode(base64))
destination_file = open(destination, 'wb')
destination_file.write(source_contents)
destination_file.close()
But the resulting file is invalid, looks like the operation's messaed up with UTF-8, encoding or something which is still unclear to me.
If I try to decode UTF-8 contents before putting them in the destination file, an error is raised:
import urllib
import base64
# ... retrieving of base64 encoded string through JSON
base64 = "77+9UE5HDQ……………oaCgA="
source_contents = urllib.unquote(base64.b64decode(base64)).decode('utf-8')
destination_file = open(destination, 'wb')
destination_file.write(source_contents)
destination_file.close()
$ python test.py
// ...
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)
As a side note, here's a screenshot of two textual representations of a same file; on left: the original; on right: the one created from the base64-decoded string: http://cl.ly/0U3G34110z3c132O2e2x
Is there a known trick to circumvent these problems with encoding when attempting to recreating the file? How would you achieve this yourself?
Any help or hint much appreciated :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
所以我在回答自己 - 对此感到抱歉 - 但我认为这对于像我一样迷失的人来说可能很有用)
所以你必须使用 ArrayBuffer 并将
XMLHttpRequest
对象实例的responseType
属性设置为arraybuffer
用于检索本机字节数组,可以使用以下方便的函数将其转换为 base64(找到 那里,作者可能会在这里受到祝福):所以这是一个工作代码:
这将记录一个代表二进制文件内容的有效 base64 编码字符串。
编辑:对于无法访问
ArrayBuffer
且btoa()
编码字符失败的旧版浏览器,这里有另一种获取 Base64 编码版本的方法任何二进制文件:希望这对其他人有帮助,就像对我一样。
So I'm answering to myself — and sorry for that — but I think it might be useful for someone as lost as I was ;)
So you have to use ArrayBuffer and set the
responseType
property of yourXMLHttpRequest
object instance toarraybuffer
for retrieving a native array of Bytes, which can be converted to base64 using the following convenient function (found there, author may be blessed here):So here's a working code:
This will log a valid base64 encoded string representing the binary file contents.
Edit: For older browsers not having access to
ArrayBuffer
and havingbtoa()
failing on encoding characters, here's another way to get a base64 encoded version of any binary:Hope this helps others as it did for me.