处理 iPhone/iPad 上使用 CGPDFScanner 获得的 PDF 文本矩阵 (Tm) 值
我正在尝试解析 pdf 内容以便搜索和突出显示文本。 我设法使用 CGPDF 的东西来查找带有 TJ 和 Tj 运算符的文本,并说出该单词在哪一页。问题来自于突出显示。
我知道文本定位的运算符是 Tm (文本矩阵)、TD 和 Td (也许是 T*),但我不知道如何使用这些信息。
当我打印 Tm 值时,我得到一个九位数的整数,我可以假设这是一个 3x3 矩阵。我可以给你输出:
2011-03-23 10:59:07.894 PDFSearch[11035:40b] BT(I) 161361744:
2011-03-23 10:59:07.896 PDFSearch[11035:40b] TM(I) 161361104:
2011-03-23 10:59:07.897 PDFSearch[11035:40b] Tf(I) 161361616:
2011-03-23 10:59:07.899 PDFSearch[11035:40b] TJ: R
2011-03-23 10:59:07.899 PDFSearch[11035:40b] TJ: e
2011-03-23 10:59:07.901 PDFSearch[11035:40b] TJ: t
2011-03-23 10:59:07.901 PDFSearch[11035:40b] TJ:我
2011-03-23 10:59:07.903 PDFSearch[11035:40b] TJ: co
2011-03-23 10:59:07.903 PDFSearch[11035:40b] TJ: l
2011-03-23 10:59:07.905 PDFSearch[11035:40b] TJ: o
2011-03-23 10:59:07.907 PDFSearch[11035:40b] ET(I) 161361872:
知道如何使用它来查找文本定位吗?并用它在带有quartz2D的pdf视图上绘制一个框?
谢谢 :)
I am trying to parse pdf content in order to search and highlight text.
I managed with CGPDF stuff to find text with TJ and Tj operators and say in which page the word is. The problem comes with the highlighting.
I followed many other posts such as this Getting text position or this Pdf search .
I know the operators for text positioning are Tm (text matrix), TD and Td (T* maybe), But I cant figure out how to use this information.
When I print the Tm value i get a nine-number integer, I can assume this is a 3x3 matrix. I can give you the output:
2011-03-23 10:59:07.894 PDFSearch[11035:40b] BT(I) 161361744:
2011-03-23 10:59:07.896 PDFSearch[11035:40b] TM(I) 161361104:
2011-03-23 10:59:07.897 PDFSearch[11035:40b] Tf(I) 161361616:
2011-03-23 10:59:07.899 PDFSearch[11035:40b] TJ: R
2011-03-23 10:59:07.899 PDFSearch[11035:40b] TJ: e
2011-03-23 10:59:07.901 PDFSearch[11035:40b] TJ: t
2011-03-23 10:59:07.901 PDFSearch[11035:40b] TJ: i
2011-03-23 10:59:07.903 PDFSearch[11035:40b] TJ: co
2011-03-23 10:59:07.903 PDFSearch[11035:40b] TJ: l
2011-03-23 10:59:07.905 PDFSearch[11035:40b] TJ: o
2011-03-23 10:59:07.907 PDFSearch[11035:40b] ET(I) 161361872:
Any idea how to use it to find text positioning? And use it to drow a box on the pdf view with quartz2D?
Thanks :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Tm
运算符有六个参数,因此您需要使用CGPDFScannerPopNumber
六次,这将获得六个浮点值,您可以使用它们来构造CGAffineTransform
>。e
和f
参数对应于tx
和ty
,否则字段的名称相同。请参阅 PDF 规范 了解更多详细信息,特别是有关文本的章节(第 250 页涵盖了
Tm
运算符)。请记住,操作数是从堆栈中弹出的,因此
f
将是您获得的第一个值,而a
将是最后一个。The
Tm
operator has six parameters, so you need to useCGPDFScannerPopNumber
six times which will get you six float values that you can use to construct aCGAffineTransform
. Thee
andf
parameters correspond totx
andty
, otherwise the fields are equally named.Refer to the PDF specification for more details, specifically the chapter about text (page 250 covers the
Tm
operator).Remember that the operands are popped from a stack, so
f
will be the first value that you get anda
the last.查看PDFKitten,开源项目,他们解析所有的TJ、Tj、TM和其他容器来计算屏幕上的文本位置。这并不完美,但却是一个开始。在 pdf 中搜索可能很棘手,有很多方法可以使 pdf 显示文本,其中一些甚至根本不是字体。
Check out PDFKitten, open source project, they parse all the TJ, Tj, TM and other containers to calculate the text position on screen. It's not perfect, but a start. Searching in pdfs can be tricky, there are so many ways to make pdf display text, some of them are not even fonts at all.