给定 pdf 文档中的一页,我希望能够使用 Objective-C 找到文本的边距。
我意识到已经有很多与CGPDF...
相关的问题,但我还没有找到任何有用的东西。我还查看了 PDF 规范文档。我确信它一定在某个地方,但我还没有找到它。
示例
我创建了一个Word文档,其左右边距各为2.5厘米。然后我打印为pdf。拿这个pdf,有什么方法可以计算出文本的宽度(即左右页边距)吗?
背景
如果我找错了树,我问这个问题的原因是能够像 iBooks 缩放一样缩放。如果您双击 iBooks,它会将您缩放到主体的宽度。这与 Mac 的预览应用程序中的情况相同(按“缩放至适合”)。
第一个想法
我首先想到也许kCGPDFBleedBox
之类的PDF Boxes
(CGPDFPage
)可能会有所帮助,但看起来并没有帮助就我而言。
更新
我只关心页面的正文。可能在这之外的图像等不会打扰我。
相关文章
快速、精简的 PDF 查看器iPhone / iPad / iO - 提示和提示?
Given a page from a pdf document, I would like to be able to find the margin for the text, using objetive-C.
I realise there are already many questions relating to CGPDF...
, but I have not been able to find anything useful. I have also had a look at the PDF specification doc. I am sure it must be in there somewhere, but I have not been able to find it yet.
Example
I create a Word document which has a left and right margin of 2.5cm each. I then print to pdf. Taking this pdf, is there some way to figure out the width of the text (ie, the left and right page margin)?
Background
In case I am barking up the wrong tree, the reason I am asking this question is to be able to zoom like iBooks zooms. If you double tap on iBooks, it will take zoom you to the width of the main body. This is the same in the Mac's Preview application (pressing "Zoom to Fit").
First thoughts
I first thought that maybe PDF Boxes
(CGPDFPage
) like kCGPDFBleedBox
might be able to help, but it does not look like it will help in my case.
Update
I am only concerned with the body text of the page. Images etc, that might be outside this do not bother me.
Related posts
Fast and Lean PDF Viewer for iPhone / iPad / iOs - tips and hints?
发布评论
评论(3)
我不熟悉苹果的“缩放以适合”功能及其确切行为(尽管我可以想象它最重要的属性)...
依赖不同的 *Box 值时的一个潜在缺点(
MediaBox
、CropBox
、TrimBox
、BleedBox
和(已弃用的)ArtBox
)是,真实空白可能仍然与其返回值不同(大多更大)。Ghostscript 有一个名为
bbox
的特殊设备,它返回所有页面呈现内容的“边界框”。示例:返回(对于我尝试使用此命令的随机 3 页示例):
您可能可以忽略高精度 HiResBoundingBox 值。这给你留下:
这四个值代表左下角和右上角的坐标或包围所有渲染像素的矩形。单位是 PostScript 点(
72 点 == 1 英寸
)。将此与
pdfinfo.exe
返回的*Box
值进行比较:更新: 这是显示 PDF 文档 3 个页面的缩略图的屏幕截图我用它来演示上面的差异:
I'm not familiar with Apple's "Zoom to Fit" feature and its exact behavior (though I can imagine its most important property)...
One potential disadvantage when relying on the different *Box values (
MediaBox
,CropBox
,TrimBox
,BleedBox
and (the deprecated)ArtBox
) is, that the real white space may still be different (mostly bigger) from their returned values.Ghostscript has a special device called
bbox
which returns the "bounding box" of all the pages' rendered content. Example:returns (for a random 3 page example I tried this command with):
You can probably ignore the high-precision HiResBoundingBox values. This leaves you with:
These four values represent the coordinates of the lower left and upper right corners or a rectangle which surrounds all rendered pixels. The units are PostScript points (
72 points == 1 inch
).Compare this to the
*Box
values as returned bypdfinfo.exe
:Update: Here is a screenshot showing the thumbnails of the PDF document's 3 pages which I used to demonstrate the differences above:
您可以将 PDF 页面渲染为位图,检测其像素状态并获取白边距。看看 Skim 的这个出色的实现: http://skim-app.svn.sourceforge.net/viewvc/skim-app/trunk/NSBitmapImageRep_SKExtensions.m?revision=7036&content-type=text%2Fplain
You can render the PDF page as a bitmap, detect its pixel status and get the white margins. Take a look at this excellent implementation from Skim: http://skim-app.svn.sourceforge.net/viewvc/skim-app/trunk/NSBitmapImageRep_SKExtensions.m?revision=7036&content-type=text%2Fplain
根据 CGPDF 文档,您最多可以获得四个内容框,它们定义了内容的保存、打印、裁剪、修剪等区域。使用 CGPDFPageGetBoxRect() 函数来获取这些框。我不确定它们的确切用途,所以这只是我对您需要哪些框的猜测:
换句话说 - 您获得页面边界和内容矩形边界并对它们进行数学计算。一旦您了解了每个框代表的含义,就不应该太难了。
According to CGPDF documentation you can get up to four content boxes which define the area in which content is held, printed, cropped, trimmed and so on. Use
CGPDFPageGetBoxRect()
function to get those boxes. I'm not sure of their exact purpose so this is just my guess on which boxes you need:In other words - you get page boundaries, and content rectangle boundaries and do the math on them. Shouldn't be too hard once you get the idea of what each box represents.