Reportlab 和 pdfrw 在 python3 中出现 matplotlib imshow() 错误
我最近将一些在 python2 中工作的代码更新为 python3,并在使用 reportlab 与 pdfrw 和 matplotlib imshow() 结合使用时遇到错误。
有人可以在 py3 中重现这个错误吗?我也不确定这是reportlab问题还是pdfrw问题。
import numpy as np
import matplotlib.pyplot as plt
from pdfrw import PdfReader
from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
fig = plt.figure(figsize=(5,5))
plt.imshow(np.random.rand(10,10))
plt.savefig('Imshow.pdf')
MyReport = canvas.Canvas('foo.pdf', pagesize=A4)
pages = PdfReader('Imshow.pdf').pages
page = pagexobj(pages[0])
MyReport.saveState()
MyReport.doForm(makerl(MyReport, page))
MyReport.restoreState()
MyReport.save()
错误显示
UnicodeEncodeError: 'charmap' codec can't encode character '\x1f' in position 6: character maps to <undefined>
系统: Windows 10, Python 3.9, pdfrw 0.4, 报告实验室 3.6.8,
I've recently updated some code which worked in python2 to python3 and encountered an error using reportlab in conjunction with pdfrw and matplotlib imshow().
Can someone reproduce this error in py3? Also I am uncertain whether it is a reportlab issue or a pdfrw problem.
import numpy as np
import matplotlib.pyplot as plt
from pdfrw import PdfReader
from pdfrw.buildxobj import pagexobj
from pdfrw.toreportlab import makerl
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
fig = plt.figure(figsize=(5,5))
plt.imshow(np.random.rand(10,10))
plt.savefig('Imshow.pdf')
MyReport = canvas.Canvas('foo.pdf', pagesize=A4)
pages = PdfReader('Imshow.pdf').pages
page = pagexobj(pages[0])
MyReport.saveState()
MyReport.doForm(makerl(MyReport, page))
MyReport.restoreState()
MyReport.save()
The error reads
UnicodeEncodeError: 'charmap' codec can't encode character '\x1f' in position 6: character maps to <undefined>
System:
Windows 10,
Python 3.9,
pdfrw 0.4,
reportlab 3.6.8,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题在于 pdfrw 处理字符串与字节的方式。
pdfrw.PdfReader
使用 Latin-1 编码加载整个源 PDF。所有 256 个可能的字节值在 Latin-1 中都是有意义的,因此将加载 matplotlib 图像中的所有二进制数据。但这会产生垃圾unicode,而reportlab在重新编码时遇到问题(因为它不使用Latin-1)。解决方案是找到真正应该是二进制的数据,并将其作为正确编码的
bytes
而不是str
传递给 reportlab。您需要破解pdfrw/toreportlab.py
第 108 行的函数_makestr
。旧的原始代码(包括 TODO 标签!):
新:
任何无法表示为 ASCII 的内容都使用原始的 Latin-1 编码进行编码,并以
字节
形式发送到 reportlab。根据我的测试,这似乎不会影响图中的非 ASCII 字符串(例如轴标签)。我猜他们在 pdfrw 代码中采用了不同的方式 - 但我不知道!
pdfrw 作为一个项目似乎已经死了,自 2017 年以来没有任何版本。如果有人看到这个并知道如何为该项目贡献补丁,请随意(或让我知道)。
The problem is the way pdfrw deals with strings vs bytes.
pdfrw.PdfReader
loads the entire source PDF using the Latin-1 encoding. All 256 possible byte values are meaningful in Latin-1, so all the binary data in the matplotlib image is loaded. But this creates junk unicode which reportlab is having problems re-encoding (because it doesn't use Latin-1).The solution is to find data which really should be binary, and pass it to reportlab as correctly-encoded
bytes
instead ofstr
. You need to hack the function_makestr
on line 108 ofpdfrw/toreportlab.py
.Old, original code (including the TODO tag!):
New:
Anything which can't be represented as ASCII is encoded using the original Latin-1 encoding and sent to reportlab as
bytes
.Based on my testing, this doesn't appear to affect non-ASCII strings in the plot (e.g. axis labels). I guess they go a different way through the pdfrw code - but I don't know!
pdfrw as a project seems to be dead with no release since 2017. If anyone sees this and knows how to contribute a patch to the project, feel free (or let me know).
阿伦·洛克谢谢。有用!!
def _makestr(rldoc, pdfobj):
Aron Lockey thank. it works!!
def _makestr(rldoc, pdfobj):