生成 PDF 时如何在 PHP 中处理极大的字符串
我有一个报告,如果给定足够大的日期范围,可以生成超过 30,000 条记录。从 HTML 方面来看,这么大的结果集不是问题,因为我实现了一个分页系统,该系统将给定时间的可查看结果限制为 100 个。
一旦用户按下“获取 PDF”按钮,我真正的问题就会出现。发生这种情况时,我基本上会重新运行报告中打印数据的部分(报告本身的结果存储在“保存”表中,因此无需重新运行数据- 收集逻辑),并将结果存储在名为 $html
的变量中。请记住,此变量现在包含 30,000 条数据记录以及在 PDF 上正确设置其格式所需的 HTML。创建此 HTML 字符串后,我将其传递给 TCPDF 以尝试为用户生成 PDF 文件。然而,它并没有生成 PDF 文件,而是直接崩溃,没有错误消息(“正在生成 PDF...”)对话框消失,并且系统的行为就像您从未要求它执行任何操作一样。
通过测试,我发现问题出在传入的$html
变量的大小上。如果报表在3K条记录以下,则可以正常工作。如果超过此限制,将打印报告的 HTML 部分,但不会打印 PDF 部分。
有用信息
- 用于 PDF 生成的PHP 5.3
- TCPDF(也尝试过 PS2PDF)
- 脚本内存限制:500 MB
在生成这种大小的 PDF 时,你们将如何处理这种数据规模?
I've got a report that can generate over 30,000 records if given a large enough date range. From the HTML side of things, a resultset this large is not a problem since I implement a pagination system that limits the viewable results to 100 at a given time.
My real problem occurs once the user presses the "Get PDF" button. When this happens, I essentially re-run the portion of the report that prints the data (the results of the report itself are stored in a 'save' table so there's no need to re-run the data-gathering logic), and store the results in a variable called $html
. Keep in mind that this variable now contains 30,000 records of data plus the HTML needed to format it correctly on the PDF. Once I've got this HTML string created, I pass it to TCPDF to try and generate the PDF file for the user. However, instead of generating the PDF file, it just craps out without an error message (the 'Generating PDf...') dialog disappears and the system acts like you never asked it to do anything.
Through tests, I've discovered that the problem lies in the size of the $html
variable being passed in. If the report under 3K records, it works fine. If it's over that, the HTML side of the report will print but not the PDF.
Helpful Info
- PHP 5.3
- TCPDF for PDF generation (also tried PS2PDF)
- Script Memory Limit: 500 MB
How would you guys handle this scale of data when generating a PDF of this size?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
以下是我解决这个问题的方法:我注意到 HTML 输出中的一些字符串存在一些轻微的编码问题 - 当我查询数据库时,我在这些特定字符串上运行了 htmlentities他们解决了问题。
不知道这是否是导致您出现问题的原因,但我的经历非常相似 - 当我尝试输出一个大尺寸的 HTML 表格(大约有 80.000 行)时,TCPDF 将显示页眉,但不显示任何表格 -有关的。对于不同的数据集和不同的表结构,这种行为是相同的。
经过多次尝试后,我开始添加自己的分页 - 每 15 个表行,我会打破页面并向下一页添加一个新表。就在那时,我注意到每隔一段时间我就会在许多完整且正确的页面之间看到空白页面。就在那时,我意识到这些特定的数据子集一定存在问题,并发现了编码问题。您可能遇到了类似的情况,但 TCPDF 没有明确说明您的问题是什么。
Here is how I solved this issue: I noticed that some of the strings that I was having in my HTML output had some slight encoding issues - I ran htmlentities on those particular strings as I was querying the database for them and that cleared the problem.
Don't know if this was what was causing your problem, but my experience was very similar - when I was trying to output an HTML table that had a large size, with about 80.000 rows, TCPDF would display the page header but nothing table-related. This behaviour would be the same with different sets of data and different table structures.
After many attempts I started adding my own pagination - every 15 table rows, I would break the page and add a new table to the following page. That's when I noticed that every once and a while I would get blank pages between a lot of full and correct ones. That's when I realised that there must be a problem with those particular subsets of data, and discovered the encoding issue. It may be that you had something similar and TCPDF was not making it clear what your problem was.
您使用 writeHTML 方法吗?
我在这里查看了性能建议:http://www.tcpdf.org/performances.php
它说“将大的 HTML 块分割成更小的块;”。
我发现,如果我的 HTML 块超过 20,000 个字符,则生成 PDF 将需要 2 分钟多的时间。
我只是将我的 html 分成多个块,并为每个块调用 writeHTML,它得到了显着的改进。以前 2 分钟内不会生成的文件现在需要 16 秒。
Are you using the writeHTML method?
I went through the performance recommendations here: http://www.tcpdf.org/performances.php
It says "Split large HTML blocks in smaller pieces;".
I found that if my blocks of HTML went over 20,000 characters the PDF would take well over 2 minutes to generate.
I simply split my html up into the blocks and called writeHTML for each block and it improved dramatically. A file that wouldn't generate in 2 minutes before now takes 16 seconds.
TCPDF 似乎是 PHP 中 PDF 生成的本机实现。使用 PDFlib 等编译库或命令可能会获得更好的性能-line 应用程序,例如 htmldoc。后者最有可能生成大型 PDF。
另外,您是否将输出的 PDF 分成多个页面?即 TCPDF 是否知道将单个 HTML 文档剪切为多个页面,或者您是否生成多个 HTML 文件以将其组合成单个 PDF 文档?这也可能有帮助。
TCPDF seems to be a native implementation of PDF generation in PHP. You may have better performance using a compiled library like PDFlib or a command-line app like htmldoc. The latter will have the best chances of generating a large PDF.
Also, are you breaking the output PDF into multiple pages? I.e. does TCPDF know to take a single HTML document and cut it into multiple pages, or are you generating multiple HTML files for it to combine into a single PDF document? That may also help.
我会将 PDF 分成几个部分,就像分页一样。
1) 每个分页 HTML 页面上都有“获取 PDF”按钮,并且仅允许从该 HTML 页面下载记录。
2) 限制可以下载的最大记录数。如果达到最大限制,则拆分 PDF 并让用户下载多个 PDF。
I would break the PDF into parts, just like pagination.
1) Have "Get PDF" button on every paginated HTML page and allow downloading of records from that HTML page only.
2) Limit the maximum number of records that can be downloaded. If the maximum limit reaches, split the PDF and let the user to download multiple PDFs.