使用 ITextSharp 编辑 PDF 中的超链接和锚点

发布于 2024-11-18 20:12:01 字数 395 浏览 5 评论 0原文

我正在使用 iTextSharp 库和 C#.Net 来分割我的 PDF 文件。

考虑一个名为 example.pdf 的 PDF 文件,包含 72 页。此sample.pdf 包含具有导航到其他页面的超链接的页面。例如:第 4 页有 3 个超链接,点击分别导航到相应的第 24、27、28 页。与第 4 页一样,有近 12 个页面带有此超链接。

现在使用 iTextSharp 库,我已将此 PDF 页面拆分为 72 个单独的文件,并以名称保存为 1.pdf,2.pdf....72.pdf。因此,在 4.pdf 中,当单击该超链接时,我需要使 PDF 导航到 24.pdf、27.pdf、28.pdf。

请帮我一下。如何编辑和设置 4.pdf 中的超链接,以便它导航到相应的 pdf 文件。

谢谢你, 阿肖克

I am using iTextSharp library and C#.Net for splitting my PDF file.

Consider a PDF file named sample.pdf containing 72 pages. This sample.pdf contains pages that have hyperlink that navigate to other page. Eg: In the page 4 there are three hyperlinks which when clicked navigates to corresponding 24th,27th,28th page. As same as the 4th page there are nearly 12 pages that is having this hyperlinks with them.

Now using iTextSharp library I had split this PDF pages into 72 separate file and saved with the name as 1.pdf,2.pdf....72.pdf. So in the 4.pdf when clicking that hyperlinks I need to make the PDF navigate to 24.pdf,27.pdf,28.pdf.

Please help me out here. How can I edit and set the hyperlinks in the 4.pdf so that it navigates to corresponding pdf files.

Thank you,
Ashok

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一抹淡然 2024-11-25 20:12:01

你想要的东西是完全有可能的。您想要的将需要您使用低级 PDF 对象(PdfDictionary、PdfArray 等)。

每当有人需要使用这些对象时,我总是将他们参考 PDF 参考。对于您的情况,您需要检查第 7 章(特别是第 3 节)和第 12 章的第 3 节(文档级导航)和第 5 节(注释)。

假设您已经阅读了本文,那么您需要执行以下操作:

  1. 逐步浏览每个页面的注释数组(在原始文档中,在分解之前)。
    1. 查找所有链接注释&他们的目的地。
    2. 为该链接构建与新文件对应的新目标。
    3. 将新目的地写入链接注释中。
  2. 使用 PdfCopy 将此页面写入新的 PDF(它将复制注释以及页面内容)。

步骤 1.1 并不简单。有几种不同类型的“本地转到”注释格式。您需要确定给定链接指向哪个页面。有些链接可能会说 PDF 相当于“下一页”或“上一页”,而其他链接可能会包含对特定页面的引用。这将是“间接对象引用”,而不是页码。

要从页面引用确定页码,您需要......哎呀。好的。最有效的方法是为原始文档中的每个页面调用 PdfReader.GetPageRef(int pageNum) 并将其缓存在地图中(reference->pageNum)。

然后,您可以通过创建远程转到 PdfAction 并将其写入链接注释的“A”(操作)条目来构建“远程转到”链接,删除之前存在的任何内容(可能是“Dest”)。

我不太会说 C#,所以我将把实际的实现留给你。

What you want is quite possible. What you want will require you to work with the low-level PDF objects (PdfDictionary, PdfArray, etc).

And whenever someone needs to work with those objects, I always refer them to the PDF Reference. In your case, you'll want to examine chapter 7 (particularly section 3) and chapter 12, sections 3 (doc-level navigation) and 5 (annotations).

Assuming you've read that, here's what you need to do:

  1. Step through the annotation array of each page (in the original doc, before breaking it up).
    1. Find all the link annotations & their destinations.
    2. Build a new destination for that link corresponding to the new file.
    3. write that new destination into the link annotation.
  2. Write this page into a new PDF using PdfCopy (it'll copy the annotations as well as the page content).

Step 1.1 isn't simple. There are several different kinds of "local goto" annotation formats. You need to determine which page a given link points to. Some links might say the PDF equivalent of "next page" or "previous page", while others will include a reference to a particular page. This will be an "indirect object reference", not a page number.

To determine the page number from a page reference, you need to... ouch. Okay. The most efficient way would be to call PdfReader.GetPageRef(int pageNum) for each page in the original document and cache it in a map (reference->pageNum).

You can then build "remote goto" links by creating a remote goto PdfAction, and writing that into the link annotation's "A" (action) entry, removing anything that was there before (probably a "Dest").

I don't speak C# very well, so I'll leave the actual implementation to you.

予囚 2024-11-25 20:12:01

好吧,基于 @Mark Storer 这里有一些起始代码。第一种方法创建一个包含 10 页的示例 PDF,并且第一页上有一些链接可以跳转到 PDF 的不同部分,以便我们可以使用一些内容。第二种方法打开第一种方法中创建的 PDF,并遍历每个注释,尝试找出注释链接到的页面并将其输出到 TRACE 窗口。代码采用 VB 编写,但如果需要,应该可以轻松转换为 C#。它的目标是 iTextSharp 5.1.1.0。

如果有机会,我可能会尝试更进一步,实际上拆分并重新链接事物,但我现在没有时间。

Option Explicit On
Option Strict On

Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports System.IO

Public Class Form1
    ''//Folder that we are working in
    Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs")
    ''//Sample PDF
    Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Sample.pdf")

    Private Shared Sub CreateSamplePdf()
        ''//Create our output directory if it does not exist
        Directory.CreateDirectory(WorkingFolder)

        ''//Create our sample PDF
        Using Doc As New iTextSharp.text.Document(PageSize.LETTER)
            Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read)
                Using writer = PdfWriter.GetInstance(Doc, FS)
                    Doc.Open()

                    ''//Turn our hyperlinks blue
                    Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE)

                    ''//Create 10 pages with simple labels on them
                    For I = 1 To 10
                        Doc.NewPage()
                        Doc.Add(New Paragraph(String.Format("Page {0}", I)))
                        ''//On the first page add some links
                        If I = 1 Then

                            ''//Go to pages relative to this page
                            Doc.Add(New Paragraph(New Chunk("First Page", BlueFont).SetAction(New PdfAction(PdfAction.FIRSTPAGE))))

                            Doc.Add(New Paragraph(New Chunk("Next Page", BlueFont).SetAction(New PdfAction(PdfAction.NEXTPAGE))))

                            Doc.Add(New Paragraph(New Chunk("Prev Page", BlueFont).SetAction(New PdfAction(PdfAction.PREVPAGE)))) ''//This one does not make sense but is here for completeness

                            Doc.Add(New Paragraph(New Chunk("Last Page", BlueFont).SetAction(New PdfAction(PdfAction.LASTPAGE))))

                            ''//Go to a specific hard-coded page number
                            Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer))))
                        End If
                    Next
                    Doc.Close()
                End Using
            End Using
        End Using
    End Sub
    Private Shared Sub ListPdfLinks()

        ''//Setup some variables to be used later
        Dim R As PdfReader
        Dim PageCount As Integer
        Dim PageDictionary As PdfDictionary
        Dim Annots As PdfArray

        ''//Open our reader
        R = New PdfReader(BaseFile)
        ''//Get the page cont
        PageCount = R.NumberOfPages

        ''//Loop through each page
        For I = 1 To PageCount
            ''//Get the current page
            PageDictionary = R.GetPageN(I)

            ''//Get all of the annotations for the current page
            Annots = PageDictionary.GetAsArray(PdfName.ANNOTS)

            ''//Make sure we have something
            If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For

            ''//Loop through each annotation
            For Each A In Annots.ArrayList

                ''//I do not completely understand this but I think this turns an Indirect Reference into an actual object, but I could be wrong
                ''//Anyway, convert the itext-specific object as a generic PDF object
                Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)

                ''//Make sure this annotation has a link
                If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For

                ''//Make sure this annotation has an ACTION
                If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For

                ''//Get the ACTION for the current annotation
                Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)

                ''//Test if it is a named actions such as /FIRST, /LAST, etc
                If AnnotationAction.Get(PdfName.S).Equals(PdfName.NAMED) Then
                    Trace.Write("GOTO:")
                    If AnnotationAction.Get(PdfName.N).Equals(PdfName.FIRSTPAGE) Then
                        Trace.WriteLine(1)
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.NEXTPAGE) Then
                        Trace.WriteLine(Math.Min(I + 1, PageCount)) ''//Any links that go past the end of the document should just go to the last page
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.LASTPAGE) Then
                        Trace.WriteLine(PageCount)
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.PREVPAGE) Then
                        Trace.WriteLine(Math.Max(I - 1, 1)) ''//Any links the go before the first page should just go to the first page
                    End If


                    ''//Otherwise see if its a GOTO page action
                ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTO) Then

                    ''//Make sure that it has a destination
                    If AnnotationAction.GetAsArray(PdfName.D) Is Nothing Then Continue For

                    ''//Once again, not completely sure if this is the best route but the ACTION has a sub DESTINATION object that is an Indirect Reference.
                    ''//The code below gets that IR, asks the PdfReader to convert it to an actual page and then loop through all of the pages
                    ''//to see which page the IR points to. Very inneficient but I could not find a way to get the page number based on the IR.

                    ''//AnnotationAction.GetAsArray(PdfName.D) gets the destination
                    ''//AnnotationAction.GetAsArray(PdfName.D).ArrayList(0) get the indirect reference part of the destination (.ArrayList(1) has fitting options)
                    ''//DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference) turns it into a PRIndirectReference
                    ''//The full line gets us an actual page object (actually I think it could be any type of pdf object but I have not tested that).
                    ''//BIG NOTE: This line really should have a bunch more sanity checks in place
                    Dim AnnotationReferencedPage = PdfReader.GetPdfObject(DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference))
                    Trace.Write("GOTO:")
                    ''//Re-loop through all of the pages in the main document comparing them to this page
                    For J = 1 To PageCount
                        If AnnotationReferencedPage.Equals(R.GetPageN(J)) Then
                            Trace.WriteLine(J)
                            Exit For
                        End If
                    Next
                End If
            Next
        Next
    End Sub

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        CreateSamplePdf()
        ListPdfLinks()
        Me.Close()
    End Sub
End Class

Alright, based on what @Mark Storer here's some starter code. The first method creates a sample PDF with 10 pages and some links on the first page that jump around to different parts of the PDF so we have something to work with. The second methods opens the PDF created in the first method and walks through each annotation trying to figure out which page the annotation links to and outputs it to the TRACE window. The code is in VB but should be easily converted to C# if needed. Its targetting iTextSharp 5.1.1.0.

If I get a chance I might try to take this further and actually split and re-link things but I don't have time right now.

Option Explicit On
Option Strict On

Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports System.IO

Public Class Form1
    ''//Folder that we are working in
    Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs")
    ''//Sample PDF
    Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Sample.pdf")

    Private Shared Sub CreateSamplePdf()
        ''//Create our output directory if it does not exist
        Directory.CreateDirectory(WorkingFolder)

        ''//Create our sample PDF
        Using Doc As New iTextSharp.text.Document(PageSize.LETTER)
            Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read)
                Using writer = PdfWriter.GetInstance(Doc, FS)
                    Doc.Open()

                    ''//Turn our hyperlinks blue
                    Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE)

                    ''//Create 10 pages with simple labels on them
                    For I = 1 To 10
                        Doc.NewPage()
                        Doc.Add(New Paragraph(String.Format("Page {0}", I)))
                        ''//On the first page add some links
                        If I = 1 Then

                            ''//Go to pages relative to this page
                            Doc.Add(New Paragraph(New Chunk("First Page", BlueFont).SetAction(New PdfAction(PdfAction.FIRSTPAGE))))

                            Doc.Add(New Paragraph(New Chunk("Next Page", BlueFont).SetAction(New PdfAction(PdfAction.NEXTPAGE))))

                            Doc.Add(New Paragraph(New Chunk("Prev Page", BlueFont).SetAction(New PdfAction(PdfAction.PREVPAGE)))) ''//This one does not make sense but is here for completeness

                            Doc.Add(New Paragraph(New Chunk("Last Page", BlueFont).SetAction(New PdfAction(PdfAction.LASTPAGE))))

                            ''//Go to a specific hard-coded page number
                            Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer))))
                        End If
                    Next
                    Doc.Close()
                End Using
            End Using
        End Using
    End Sub
    Private Shared Sub ListPdfLinks()

        ''//Setup some variables to be used later
        Dim R As PdfReader
        Dim PageCount As Integer
        Dim PageDictionary As PdfDictionary
        Dim Annots As PdfArray

        ''//Open our reader
        R = New PdfReader(BaseFile)
        ''//Get the page cont
        PageCount = R.NumberOfPages

        ''//Loop through each page
        For I = 1 To PageCount
            ''//Get the current page
            PageDictionary = R.GetPageN(I)

            ''//Get all of the annotations for the current page
            Annots = PageDictionary.GetAsArray(PdfName.ANNOTS)

            ''//Make sure we have something
            If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For

            ''//Loop through each annotation
            For Each A In Annots.ArrayList

                ''//I do not completely understand this but I think this turns an Indirect Reference into an actual object, but I could be wrong
                ''//Anyway, convert the itext-specific object as a generic PDF object
                Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)

                ''//Make sure this annotation has a link
                If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For

                ''//Make sure this annotation has an ACTION
                If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For

                ''//Get the ACTION for the current annotation
                Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)

                ''//Test if it is a named actions such as /FIRST, /LAST, etc
                If AnnotationAction.Get(PdfName.S).Equals(PdfName.NAMED) Then
                    Trace.Write("GOTO:")
                    If AnnotationAction.Get(PdfName.N).Equals(PdfName.FIRSTPAGE) Then
                        Trace.WriteLine(1)
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.NEXTPAGE) Then
                        Trace.WriteLine(Math.Min(I + 1, PageCount)) ''//Any links that go past the end of the document should just go to the last page
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.LASTPAGE) Then
                        Trace.WriteLine(PageCount)
                    ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.PREVPAGE) Then
                        Trace.WriteLine(Math.Max(I - 1, 1)) ''//Any links the go before the first page should just go to the first page
                    End If


                    ''//Otherwise see if its a GOTO page action
                ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTO) Then

                    ''//Make sure that it has a destination
                    If AnnotationAction.GetAsArray(PdfName.D) Is Nothing Then Continue For

                    ''//Once again, not completely sure if this is the best route but the ACTION has a sub DESTINATION object that is an Indirect Reference.
                    ''//The code below gets that IR, asks the PdfReader to convert it to an actual page and then loop through all of the pages
                    ''//to see which page the IR points to. Very inneficient but I could not find a way to get the page number based on the IR.

                    ''//AnnotationAction.GetAsArray(PdfName.D) gets the destination
                    ''//AnnotationAction.GetAsArray(PdfName.D).ArrayList(0) get the indirect reference part of the destination (.ArrayList(1) has fitting options)
                    ''//DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference) turns it into a PRIndirectReference
                    ''//The full line gets us an actual page object (actually I think it could be any type of pdf object but I have not tested that).
                    ''//BIG NOTE: This line really should have a bunch more sanity checks in place
                    Dim AnnotationReferencedPage = PdfReader.GetPdfObject(DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference))
                    Trace.Write("GOTO:")
                    ''//Re-loop through all of the pages in the main document comparing them to this page
                    For J = 1 To PageCount
                        If AnnotationReferencedPage.Equals(R.GetPageN(J)) Then
                            Trace.WriteLine(J)
                            Exit For
                        End If
                    Next
                End If
            Next
        Next
    End Sub

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        CreateSamplePdf()
        ListPdfLinks()
        Me.Close()
    End Sub
End Class
醉态萌生 2024-11-25 20:12:01

下面的这个函数使用 iTextSharp 来:

  1. 打开 PDF
  2. 通过 PDF
  3. 页面检查每个页面上的注释是否有锚点

第 4 步是在此处插入您想要的任何逻辑...更新链接、记录它们等。

    /// <summary>Inspects PDF files for internal links.
    /// </summary>
    public static void FindPdfDocsWithInternalLinks()
    {
        foreach (var fi in PdfFiles) {
            try {
                var reader = new PdfReader(fi.FullName);
                // Pagination
                for(var i = 1; i <= reader.NumberOfPages; i++) {
                    var pageDict = reader.GetPageN(i);
                    var annotArray = (PdfArray)PdfReader.GetPdfObject(pageDict.Get(PdfName.ANNOTS));
                    if (annotArray == null) continue;
                    if (annotArray.Length <= 0) continue;
                    // check every annotation on the page
                    foreach (var annot in annotArray.ArrayList) {
                        var annotDict = (PdfDictionary)PdfReader.GetPdfObject(annot);
                        if (annotDict == null) continue;
                        var subtype = annotDict.Get(PdfName.SUBTYPE).ToString();
                        if (subtype != "/Link") continue;
                        var linkDict = (PdfDictionary)annotDict.GetDirectObject(PdfName.A);
                        if (linkDict == null) continue;
                        // if it makes it this far, its an Anchor annotation
                        // so we can grab it's URI
                        var sUri = linkDict.Get(PdfName.URI).ToString();
                        if (String.IsNullOrEmpty(sUri)) continue;
                    }
                }
                reader.Close();
            }
            catch (InvalidPdfException e)
            {
                if (!fi.FullName.Contains("_vti_cnf"))
                    Console.WriteLine("\r\nInvalid PDF Exception\r\nFilename: " + fi.FullName + "\r\nException:\r\n" + e);
                continue;
            }
            catch (NullReferenceException e) 
            {
                if (!fi.FullName.Contains("_vti_cnf"))
                    Console.WriteLine("\r\nNull Reference Exception\r\nFilename: " + fi.Name + "\r\nException:\r\n" + e);
                continue;
            }
        }

        // DO WHATEVER YOU WANT HERE
    }

好运气。

This function below uses iTextSharp to:

  1. Open the PDF
  2. Page throught he PDF
  3. Inspect the annotations on each page for those that are ANCHORS

Step #4 is to insert whatever logic you want in here... update the links, log them, etc.

    /// <summary>Inspects PDF files for internal links.
    /// </summary>
    public static void FindPdfDocsWithInternalLinks()
    {
        foreach (var fi in PdfFiles) {
            try {
                var reader = new PdfReader(fi.FullName);
                // Pagination
                for(var i = 1; i <= reader.NumberOfPages; i++) {
                    var pageDict = reader.GetPageN(i);
                    var annotArray = (PdfArray)PdfReader.GetPdfObject(pageDict.Get(PdfName.ANNOTS));
                    if (annotArray == null) continue;
                    if (annotArray.Length <= 0) continue;
                    // check every annotation on the page
                    foreach (var annot in annotArray.ArrayList) {
                        var annotDict = (PdfDictionary)PdfReader.GetPdfObject(annot);
                        if (annotDict == null) continue;
                        var subtype = annotDict.Get(PdfName.SUBTYPE).ToString();
                        if (subtype != "/Link") continue;
                        var linkDict = (PdfDictionary)annotDict.GetDirectObject(PdfName.A);
                        if (linkDict == null) continue;
                        // if it makes it this far, its an Anchor annotation
                        // so we can grab it's URI
                        var sUri = linkDict.Get(PdfName.URI).ToString();
                        if (String.IsNullOrEmpty(sUri)) continue;
                    }
                }
                reader.Close();
            }
            catch (InvalidPdfException e)
            {
                if (!fi.FullName.Contains("_vti_cnf"))
                    Console.WriteLine("\r\nInvalid PDF Exception\r\nFilename: " + fi.FullName + "\r\nException:\r\n" + e);
                continue;
            }
            catch (NullReferenceException e) 
            {
                if (!fi.FullName.Contains("_vti_cnf"))
                    Console.WriteLine("\r\nNull Reference Exception\r\nFilename: " + fi.Name + "\r\nException:\r\n" + e);
                continue;
            }
        }

        // DO WHATEVER YOU WANT HERE
    }

Good luck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文