SAS:读取PDF文件

发布于 2024-11-19 23:19:20 字数 338 浏览 2 评论 0原文

我正在寻找使用 SAS 读取 PDF 文件的方法。显然这不是基本功能,并且在互联网上几乎找不到。 (更不用说谷歌在搜索中使用 PDF 并不容易,它还为您提供了有关其他内容的 PDF 文档的链接。)

唯一可以找到的东西是人们寻找将数据从 PDF 导入数据集的方法。对我来说,这甚至没有必要。我希望能够在一个大字符变量中读取 PDF 文件的内容。如果可能的话,如果能够读取文件的二进制数据就更好了。

SAS 可以做到这一点吗?如何实现? (我让它在 Access VBA 中工作,但在 SAS 中找不到任何类似的方法。)

(最后,目的是将其转换为 base64 并将该 base64 字符串放入 XML 文档中。)

I am looking for ways to read in a PDF file with SAS. Apparently this is not basic functionality and there is very little to be found on the internet. (Let alone that google is not easy with PDF in you search giving you also links to PDF documents that go about other things.)

The only things that can be found, are people looking for ways to import data into datasets from a PDF. For me, that is not even necesarry. I would like to be able to read the contents of the PDF file in one big character variable. If possible, it would even be better to be able to read in the file's binary data.

Is this possible with SAS and how? (I got it to work in Access VBA, but can't find any similar ways in SAS.)

(In the end, the purpose is to convert this to base64 and put that base64-string into an XML document.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

如果没有 2024-11-26 23:19:20

您可能无法将整个文件读入一个字符变量,因为字符变量的最大大小约为 33 KB。不过,一次读取一行的简单方法如下所示:

%let pdfFileName = Test.pdf;
%let lineSize = 2000;

data base;
   format text_line 
amp;lineSize..;
   infile "&pdfFileName" lrecl=&lineSize;
   input text_line $;
run;

这要求您提前大致了解最大记录长度,但您可以编写额外的代码来确定之前的最大记录大小读取文件。在此示例中,每一行文本都被读入一个名为“text_line”的字符变量中。从那里,您可以在 INPUT 行中使用 RETAIN 语句或双尾部 (@@) 一次处理多行。 SAS 网站有大量有关如何读取和处理各种类型输入文件中的文本的文档。

You probably will not be able to read the entire file into one character variable since the maximum size of a character variable is around 33 KB. A simple way to read in one line at a time, though, is something like the following:

%let pdfFileName = Test.pdf;
%let lineSize = 2000;

data base;
   format text_line 
amp;lineSize..;
   infile "&pdfFileName" lrecl=&lineSize;
   input text_line $;
run;

This requires that you have a general idea of the maximum record length ahead of time, but you could write additional code to determine the maximum record size prior to reading in the file. In this example each line of text is read into one character variable named "text_line." From there, you could use a RETAIN statement or double trailers (@@) in the INPUT line to process multiple lines at a time. The SAS web-site has plenty of documentation on how to read and process text from various types of input files.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文