更改 PDF - 文本重新定位
有没有办法将现有 pdf 页面内的文本转移/移动到其他位置?
就像在区域 x=100、y=100、w=100、h=100 处有一些文本,我想将其移动到 x=50、y=200、w=100、h=100。
我做了很多研究,看来 iTextSharp 无法做到这一点。 PDFSharp
声称可以做到,但我找不到任何示例。
一种方法是制作我想要移动的文本的特定区域的位图,在该区域上绘制白色矩形并在新位置插入位图。我不想使用此解决方案,因为我处理包含超过 1K 页的大型 pdf 文件,其中每个页面都必须更改。
我发现我需要找到一种方法来更改文本定位运算符(文本矩阵和文本状态参数),这并不那么简单。
有人有什么想法吗?
Is there any way to shift / move the text inside existing pdf page to some other position?
Like there is some text at area x=100, y=100, w=100, h=100 and i want to move it to x=50, y=200, w=100, h=100.
I did a lot of research and it seems iTextSharp
cannot do that. PDFSharp
claims that it can be done but i could not find any examples.
One way is to make a bitmap of specific area of the text i want to shift, draw white rectangle over that area and insert bitmap at new location. I don't want to use this solution as i work with large pdf files with more than 1K pages where each page has to be altered.
What i found out is that i need to find a way to change text-positioning operators (text matrix and the text state parameters) which is not that simple.
Anyone has any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为如果来自同一个应用程序的所有 PDF 文件都很简单(不复杂),就可以完成。
如果您需要此功能(例如用户可以上传文件的网站),那么最好忘记它:您永远不会获得与任何 PDF 文件完美配合的解决方案。
PDFsharp 可以提供帮助 - 但据我所知 PDFsharp 只能完成您所需的一半。 PDFsharp 将为您提供组成 PDF 文件的块。您必须解析块以找到绘图指令,检查位置并重新定位它们。
有些应用程序甚至不绘制单词,因此一个简单的单词(例如“Hello”)可以分为 3 个块(可能是“He”、“ll”和“o”)绘制。您可能需要注意这一点;如果所有文件都来自同一个应用程序,则可能不会。
我认为此处显示的用于提取文本的代码可能会有所帮助:
http://forum.pdfsharp.net/viewtopic.php?p=4010#p4010< br>
要重新定位文本,您必须首先找到它 - 仍然需要大量额外的工作......
I think it can be done if all the PDF files are simple (not complex) coming from the same application.
If you need this for e.g. a website where users can upload files, then better forget it: you'll never get a solution that will work perfectly with any PDF file.
PDFsharp can help - but AFAIK PDFsharp only does half of what you need. PDFsharp will give you the blocks that make up the PDF file. You have to parse the blocks to find the drawing instructions, check the positions, and relocate them.
Some applications don't even draw words, so a simple word such as "Hello" could be drawn in 3 chunks (maybe "He", "ll" and "o"). You may have to pay attention to this; maybe not if all files come from the same application.
I think the code shown here to extract text could be helpful:
http://forum.pdfsharp.net/viewtopic.php?p=4010#p4010
To relocate text you have to find it in the first place - a lot of additional work still needed ...
您可以使用 Page.Contents.Elements.RemoveAt(8) 删除对象
通过检查 Page.Contents.Elements.Count 来验证元素计数。
您可以获得每个元素的字符串值(进行一些字符串验证),您可以按如下方式获取数据。
You can remove an object using Page.Contents.Elements.RemoveAt(8)
Validate the element count by checking Page.Contents.Elements.Count.
you can get the string value of each element (to do some string validation) you can fetch the data as below.
或者您可以在新位置绘制并创建只读文本表单
Or you could draw over and create a read only text form at the new location