是否可以将pdf文件读取为txt?

发布于 2024-08-31 03:28:19 字数 91 浏览 2 评论 0原文

我需要在pdf文件中找到某个键​​。据我所知,唯一的方法是将 pdf 解释为 txt 文件。我想在 PHP 中执行此操作,而不安装插件/框架/等。

谢谢

I need to find a certain key in a pdf file. As far as I know the only way to do that is to interpret a pdf as txt file. I want to do this in PHP without installing a addon/framework/etc.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

陈独秀 2024-09-07 03:28:19

您当然可以将 PDF 文件作为文本打开。 PDF 文件格式实际上是对象的集合。第一行有一个标题告诉您版本。然后,您将转到底部查找到外部参照表开头的偏移量,该偏移量表明所有对象的位置。文件中各个对象的内容(例如图形)通常是二进制且经过压缩的。 1.7 规范可在此处找到。

You can certainly open a PDF file as text. PDF file format is actually a collection of objects. There is a header in the first line that tells you the version. You would then go to the bottom to find the offset to the start of the xref table that tells where all the objects are located. The contents of individual objects in the file, like graphics, are often binary and compressed. The 1.7 specification can be found here.

凉栀 2024-09-07 03:28:19

我发现了这个功能,希望对你有帮助。

http://community.livejournal.com/php/295413.html

I found this function, hope it helps.

http://community.livejournal.com/php/295413.html

聽兲甴掵 2024-09-07 03:28:19

您不能直接打开该文件,因为它是用于创建 PDF 显示的对象的二进制转储,包括编码、字体、文本、图像。我写了一篇博客文章解释文本如何存储在 http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams

You can't just open the file as it is a binary dump of objects used to create the PDF display, including encoding, fonts, text, images. I wrote an blog post explaining how text is stored at http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams

Bonjour°[大白 2024-09-07 03:28:19

谢谢大家的帮助。我欠你这段代码:

// Proceed if file exists
if(file_exists($sourcePath)){
    $pdfFile = fopen($sourcePath,"rb");
    $data = fread($pdfFile, filesize($sourcePath));
    fclose($pdfFile);

    // Check if file is encrypted or not
    if(stripos($data,$searchFor)){ // $searchFor = "/Encrypt"
        $counterEncrypted++;    
    }else{
        $counterNotEncrpyted++;
    }
}else{
    $counterNotExisting++;
}

Thank you all for your help. I owe you this piece of code:

// Proceed if file exists
if(file_exists($sourcePath)){
    $pdfFile = fopen($sourcePath,"rb");
    $data = fread($pdfFile, filesize($sourcePath));
    fclose($pdfFile);

    // Check if file is encrypted or not
    if(stripos($data,$searchFor)){ // $searchFor = "/Encrypt"
        $counterEncrypted++;    
    }else{
        $counterNotEncrpyted++;
    }
}else{
    $counterNotExisting++;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文