PDF 页数不正确

发布于 2024-12-01 23:38:47 字数 1897 浏览 0 评论 0原文

我只是想知道为什么下面链接中的 vbs 代码没有正确计算 pdf 页数?似乎每个 pdf 中实际存在的页数少了一半或更多。

http://docs.ongetc.com /index.php?q=content/pdf-pages-counting-using-vb-script

如果您无法访问上面的链接,请使用以下代码:

' By Chanh Ong
'File: pdfpagecount.vbs
' Purpose: count pages in pdf file in folder
Const OPEN_FILE_FOR_READING = 1

Set gFso = WScript.CreateObject("Scripting.FileSystemObject")
Set gShell = WScript.CreateObject ("WSCript.shell")
Set gNetwork = Wscript.CreateObject("WScript.Network")

  directory="." 
  set base=gFso.getFolder(directory) 
  call listPDFFile(base) 

Function ReadAllTextFile(filespec)
   Const ForReading = 1, ForWriting = 2
   Dim f
   Set f = gFso.OpenTextFile(filespec, ForReading)
   ReadAllTextFile =   f.ReadAll
End Function

function countPage(sString)
  Dim regEx, Match, Matches, counter, sPattern
  sPattern = "/Type\s*/Page[^s]"  ' capture PDF page count
  counter = 0

  Set regEx = New RegExp         ' Create a regular expression.
  regEx.Pattern = sPattern    ' Set pattern "^rem".
  regEx.IgnoreCase = True         ' Set case insensitivity.
  regEx.Global = True         ' Set global applicability.
  set Matches = regEx.Execute(sString)   ' Execute search.
  For Each Match in Matches      ' Iterate Matches collection.
    counter = counter + 1
  Next
  if counter = 0 then
    counter = 1
  end if
  countPage = counter
End Function

sub listPDFFile(grp) 
  Set pf = gFso.CreateTextFile("pagecount.txt", True)
for each file in grp.files 
    if (".pdf" = lcase(right(file,4))) then 
      larray = ReadAllTextFile(file)
      pages = countPage(larray)
      wscript.echo "The " & file.name & " PDF file has " & pages & " pages"
      pf.WriteLine(file.name&","&pages) 
    end if
next 
  pf.Close
end sub

谢谢

I was just wondering why the vbs code in the link below is not counting pdf pages correctly? It seems to under count by half or more the number of pages that actually exist in each pdf.

http://docs.ongetc.com/index.php?q=content/pdf-pages-counting-using-vb-script

Here is the code if you can not access the link above:

' By Chanh Ong
'File: pdfpagecount.vbs
' Purpose: count pages in pdf file in folder
Const OPEN_FILE_FOR_READING = 1

Set gFso = WScript.CreateObject("Scripting.FileSystemObject")
Set gShell = WScript.CreateObject ("WSCript.shell")
Set gNetwork = Wscript.CreateObject("WScript.Network")

  directory="." 
  set base=gFso.getFolder(directory) 
  call listPDFFile(base) 

Function ReadAllTextFile(filespec)
   Const ForReading = 1, ForWriting = 2
   Dim f
   Set f = gFso.OpenTextFile(filespec, ForReading)
   ReadAllTextFile =   f.ReadAll
End Function

function countPage(sString)
  Dim regEx, Match, Matches, counter, sPattern
  sPattern = "/Type\s*/Page[^s]"  ' capture PDF page count
  counter = 0

  Set regEx = New RegExp         ' Create a regular expression.
  regEx.Pattern = sPattern    ' Set pattern "^rem".
  regEx.IgnoreCase = True         ' Set case insensitivity.
  regEx.Global = True         ' Set global applicability.
  set Matches = regEx.Execute(sString)   ' Execute search.
  For Each Match in Matches      ' Iterate Matches collection.
    counter = counter + 1
  Next
  if counter = 0 then
    counter = 1
  end if
  countPage = counter
End Function

sub listPDFFile(grp) 
  Set pf = gFso.CreateTextFile("pagecount.txt", True)
for each file in grp.files 
    if (".pdf" = lcase(right(file,4))) then 
      larray = ReadAllTextFile(file)
      pages = countPage(larray)
      wscript.echo "The " & file.name & " PDF file has " & pages & " pages"
      pf.WriteLine(file.name&","&pages) 
    end if
next 
  pf.Close
end sub

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

○闲身 2024-12-08 23:38:47

提供(并接受)的解决方案仅适用于有限数量的 PDF 文档。由于 PDF 文档经常压缩包括页面元数据在内的大块数据,因此对“type\s*/page[^s]”的粗略正则表达式搜索通常会丢失页面。

唯一真正可靠的解决方案是非常费力分解PDF文档。恐怕我没有有效的 VBS 解决方案,但我编写了一个 Delphi 函数来演示如何执行此操作(请参阅 http://www.angusj.com/delphitips/pdfpagecount.php)。

The solution offered (and accepted) will only work for a limited number of PDF documents. Since PDF documents frequently compress large chunks of data including page metadata, crude regular expression searches for "type\s*/page[^s]" will often miss pages.

The only really reliable solution is to very laboriously decompose the PDF document. I'm afraid I don't have a working VBS solution but I have written a Delphi function which demonstrates how to do this (see http://www.angusj.com/delphitips/pdfpagecount.php).

稳稳的幸福 2024-12-08 23:38:47

试试这个

Function getPdfPgCnt(ByVal sPath)
    Dim strTStr

    With CreateObject("Adodb.Stream")
        .Open
        .Charset = "x-ansi"
        .LoadFromFile sPath
        strTStr = .ReadText(-1)
    End With

    With (New RegExp)
        .Pattern = "Type\s+/Page[^s]"
        .IgnoreCase = True
        .Global = True
        getPdfPgCnt = .Execute(strTStr).Count
    End With

    If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function

'Usage : getPdfPgCnt("C:\1.pdf")

更新#1~#2:

Option Explicit

Private Function getPdfPgCnt(ByVal sPath) 'Returns page count of file on passed path
    Dim strTStr

    With CreateObject("Adodb.Stream")
        .Open
        .Charset = "x-ansi"
        .LoadFromFile sPath
        strTStr = .ReadText(-1)
    End With

    With (New RegExp)
        .Pattern = "Type\s*/Page[^s]"
        .IgnoreCase = True
        .Global = True
        getPdfPgCnt = .Execute(strTStr).Count
    End With

    If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function

'--------------------------------
Dim oFso, iFile
Set oFso = CreateObject("Scripting.FileSystemObject")

'enumerating pdf files in vbs's base directory
For Each iFile In oFso.getFolder(oFso.GetParentFolderName(WScript.ScriptFullName)).Files
    If LCase(oFso.GetExtensionName(iFile)) = "pdf" Then WScript.Echo iFile & " has "& getPdfPgCnt(iFile)&" pages."
Next
Set oFso = Nothing
'--------------------------------

Try this

Function getPdfPgCnt(ByVal sPath)
    Dim strTStr

    With CreateObject("Adodb.Stream")
        .Open
        .Charset = "x-ansi"
        .LoadFromFile sPath
        strTStr = .ReadText(-1)
    End With

    With (New RegExp)
        .Pattern = "Type\s+/Page[^s]"
        .IgnoreCase = True
        .Global = True
        getPdfPgCnt = .Execute(strTStr).Count
    End With

    If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function

'Usage : getPdfPgCnt("C:\1.pdf")

Update #1~#2:

Option Explicit

Private Function getPdfPgCnt(ByVal sPath) 'Returns page count of file on passed path
    Dim strTStr

    With CreateObject("Adodb.Stream")
        .Open
        .Charset = "x-ansi"
        .LoadFromFile sPath
        strTStr = .ReadText(-1)
    End With

    With (New RegExp)
        .Pattern = "Type\s*/Page[^s]"
        .IgnoreCase = True
        .Global = True
        getPdfPgCnt = .Execute(strTStr).Count
    End With

    If getPdfPgCnt = 0 Then getPdfPgCnt = 1
End Function

'--------------------------------
Dim oFso, iFile
Set oFso = CreateObject("Scripting.FileSystemObject")

'enumerating pdf files in vbs's base directory
For Each iFile In oFso.getFolder(oFso.GetParentFolderName(WScript.ScriptFullName)).Files
    If LCase(oFso.GetExtensionName(iFile)) = "pdf" Then WScript.Echo iFile & " has "& getPdfPgCnt(iFile)&" pages."
Next
Set oFso = Nothing
'--------------------------------
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文