如何将 PDF 二进制数据解码为 Postscript 可读文本
我目前正在寻找一种方法,仅从 PDF 的二进制数据中提取 PDF 背后的代码 (PostScript)。 我有一个文件类型的输入,其中附加了 onChange 事件处理程序。事件处理程序如下所示:
const handleFileChange = (e) => {
const [file] = e.target.files
const reader = new FileReader()
reader.addEventListener("loadend", () => {
const buffer = reader.result
const view = new DataView(buffer)
const decoder = new TextDecoder("ascii")
const text = decoder.decode(view)
})
reader.readAsArrayBuffer(file)
}
此代码生成如下所示的文本 %PDF-1.3 %äåòåë§ó ÐäÆ 3 0 obj << /Filter /FlateDecode /长度 4573 >>>流 xµ\ëŽÜ¶þϧ жVwi‚ hìM
开始还好,后面的一切都是问题 - %äåòåë§ó ÐäÆ
& µ\ëŽÜ¶þϧ жVwi‚ hìM
。难以辨认。我不确定我是否选择了错误的 编码 或者我选择了什么我着手做的事情几乎是不可能的。我在这里错过了什么吗?如果是的话,具体是什么? 我感谢任何和所有的帮助。谢谢你!
(资源也欢迎)
I'm currently searching for a way to extract the code (PostScript) behind a PDF just from its binary data.
I have an input of type file that has an onChange event handler attached. The event handler looks like this:
const handleFileChange = (e) => {
const [file] = e.target.files
const reader = new FileReader()
reader.addEventListener("loadend", () => {
const buffer = reader.result
const view = new DataView(buffer)
const decoder = new TextDecoder("ascii")
const text = decoder.decode(view)
})
reader.readAsArrayBuffer(file)
}
This code produces text that looks like this %PDF-1.3 %Äåòåë§ó ÐÄÆ 3 0 obj << /Filter /FlateDecode /Length 4573 >> stream xµ\ëŽÜ¶þϧ жVwi‚ hìM
The beginning is just fine, it's everything afterward is the issue - %Äåòåë§ó ÐÄÆ
& µ\ëŽÜ¶þϧ жVwi‚ hìM
. It's illegible. I'm not sure if I just have the wrong encoding selected or what I'm setting out to do is all but impossible. Am I missing something here? If yes, what exactly?
I appreciate any and all help. Thank you!
(resources are also welcomed)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论