ReportLab 中的 Unicode 处理

发布于 2024-08-08 03:19:50 字数 1545 浏览 7 评论 0原文

我正在尝试将 ReportLab 与 Unicode 字符一起使用，但它不起作用。我尝试跟踪代码，直到到达以下行：（

class TTFont:
    # ...
    def splitString(self, text, doc, encoding='utf-8'):
        # ...
        cur.append(n & 0xFF) # <-- here is the problem!
        # ...

此代码可以在 ReportLab 的存储库中的文件 pdfbase/ttfonts.py。有问题的代码位于第 1059 行。）

为什么是 n' s 值正在被操作？

在上面显示的行中，n 包含正在处理的字符的代码点（例如，'A' 为 65，'a' 为 97，或者阿拉伯语为 1588）光泽“ô”）。 cur 是一个列表，其中填充了要发送到最终输出 (AFAIU) 的字符。在该行之前，一切（显然）工作正常，但在这一行中，n 的值被操纵，显然将其减少到扩展的 ASCII 范围！

这会导致非 ASCII、Unicode 字符失去其值。我不明白这个声明有什么用处，或者为什么有必要！

所以我的问题是，为什么 n 的值在这里被操纵，我应该如何解决这个问题？

编辑：
为了响应有关我的代码片段的评论，这里有一个导致此错误的示例：

my_doctemplate.build([Paragraph(bulletText = None, encoding = 'utf8',
    caseSensitive = 1, debug = 0,
    text = '\xd8\xa3\xd8\xa8\xd8\xb1\xd8\xa7\xd8\xac',
    frags = [ParaFrag(fontName = 'DejaVuSansMono-BoldOblique',
        text = '\xd8\xa3\xd8\xa8\xd8\xb1\xd8\xa7\xd8\xac',
        sub = 0, rise = 0, greek = 0, link = None, italic = 0, strike = 0,
        fontSize = 12.0, textColor = Color(0,0,0), super = 0, underline = 0,
        bold = 0)])])

在 PDFTextObject._textOut 中，调用 _formatText ，它将字体标识为 _dynamicFont，并相应地调用 font.splitString，这导致了上述错误。

原文

I am trying to use ReportLab with Unicode characters, but it is not working. I tried tracing through the code until I reached the following line:

class TTFont:
    # ...
    def splitString(self, text, doc, encoding='utf-8'):
        # ...
        cur.append(n & 0xFF) # <-- here is the problem!
        # ...

(This code can be found in ReportLab's repository, in the file pdfbase/ttfonts.py. The code in question is in line 1059.)

Why is n's value being manipulated?

In the line shown above, n contains the code point of the character being processed (e.g. 65 for 'A', 97 for 'a', or 1588 for Arabic sheen 'ش'). cur is a list that is being filled with the characters to be sent to the final output (AFAIU). Before that line, everything was (apparently) working fine, but in this line, the value of n was manipulated, apparently reducing it to the extended ASCII range!

This causes non-ASCII, Unicode characters to lose their value. I cannot understand how this statement is useful, or why it is necessary!

So my question is, why is n's value being manipulated here, and how should I proceed about fixing this issue?

Edit:
In response to the comment regarding my code snippet, here is an example that causes this error:

my_doctemplate.build([Paragraph(bulletText = None, encoding = 'utf8',
    caseSensitive = 1, debug = 0,
    text = '\xd8\xa3\xd8\xa8\xd8\xb1\xd8\xa7\xd8\xac',
    frags = [ParaFrag(fontName = 'DejaVuSansMono-BoldOblique',
        text = '\xd8\xa3\xd8\xa8\xd8\xb1\xd8\xa7\xd8\xac',
        sub = 0, rise = 0, greek = 0, link = None, italic = 0, strike = 0,
        fontSize = 12.0, textColor = Color(0,0,0), super = 0, underline = 0,
        bold = 0)])])

In PDFTextObject._textOut, _formatText is called, which identifies the font as _dynamicFont, and accordingly calls font.splitString, which is causing the error described above.

分享到QQ

分享到微博