我应该如何提供压缩网页?
背景:
我们的软件以常见的可疑格式(HTML、PDF 等)为客户生成报告,每个报告都可以包含该报告特有的图表和其他图形。 对于 PDF,所有内容都保存在一处 - PDF 文件本身。 HTML 比较棘手,因为报告基本上是多个文件的总和。 这些文件可通过 Tomcat 通过 HTTP 获取。
问题:
我真的想要一个整洁的环境并将 HTML 报告包装到一个文件中。 有 MTHML、数据 URI 和多种格式需要考虑。 这很棒问题假设,鉴于这些格式缺乏跨浏览器支持,ZIP 是一个巧妙的解决方案。 这对我很有吸引力,因为我还可以提供 zip 格式的下载,作为“您可以通过电子邮件发送的 HTML 报告”选项。 (过去,用户抱怨在开始通过电子邮件发送 HTML 报告时丢失了图形)
解决方案似乎很简单。 收到请求后,我找到相应的 zip,将其解压到网络服务器上的某个位置,将请求指向新的 HTML 文件,大约一天后再次整理所有内容。
但事情似乎不太对劲。 我有一种直觉,这不是一个好的解决方案,它有本质上的错误,或者可能存在我目前看不到的更好的方法。
任何人都可以建议这是好还是坏,并提供替代解决方案?
编辑以获取更多背景信息!
报告需要保留在服务器上。 我们的客户是站点的用户,单个报告的可见性可能与站点上的每个人一样广泛。 创建过程涉及用户选择报告的标准,并将其提交到服务器进行创建。 从数据库中提取数据并构建文档。 占位符记录进入数据库,文档本身存储在文件服务器的某个位置。 我希望更加整洁的是“文件服务器上的文档”部分 - 压缩也意味着使用更少的磁盘空间! 创建报告后,任何可以查看该报告的人都可以使用该报告。
Background:
Our software generates reports for customers in the usual suspect formats (HTML, PDF, etc.) and each report can contain charts and other graphics unique to that report. For PDFs everthing is held in one place - the PDF file itself. HTML is trickier as the report is basically the sum of more than 1 file. The files are available via HTTP through Tomcat.
Problem:
I really want to have a tidy environment and wrap the HTML reports into a single file. There's MTHML, Data URIs, several formats to consider. This excellent question posits that, given the lack of cross-broser support for these formats, ZIP is a neat solution. This is attractive to me as I can also offer the zip for download as a "HTML report you can email" option. (In the past users have complained about losing the graphics about when they set about emailling HTML reports)
The solution seems simple. A request comes in, I locate the appropriate zip, unpack it somewhere on the webserver, point the request at the new HTML file, and after a day or so tidy everything up again.
But something doesn't quite seem right about that. I've kind of got a gut feeling that it's not a good solution, that there's something intrisically wrong with it, or that maybe a better way exists that I can't see at the moment.
Can anyone suggest whether this is good or bad, and offer an alternative solution?
Edit for more background information!
The reports need to persist on the server. Our customers are users at sites, and the visibility of a single report could be as wide as everyone at the site. The creation process involves the user selecting the criteria for the report, and submitting it for creation to the server. Data is extracted from the database and a document built. A placeholder record goes into the database, and the documents themselves get stored on the fileserver somewhere. It's the 'documents on the fileserver' part that I'd like to be tidier - zipping also means less disk space used!. Once a report is created, it is available to anyone who can see it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我原以为该计划是 zip 文件最终出现在客户端上,而不是留在服务器上。
在不了解您的架构的情况下,我会猜测这样的方法:
这依赖于当然,能够重新运行报告以生成 zip 文件。 每次生成一些 HTML 时,您可以生成一个 zip 文件,但如果您不需要需要这样做,那么这就很浪费,并且需要清理等。
也许我'不过我误解了你......如果这听起来不合适,你能更新你的问题吗?
编辑:好的,看到您的问题的更新后,我很想将每个报告的文件存储在单独的目录中(例如使用 GUID 作为目录名称)。 许多文件系统支持文件系统级别的压缩,因此“过早压缩”可能不会节省太多磁盘空间,并且会使提取单个文件变得更加困难。 然后,如果用户请求 zip,您只需要在提供该文件之前在此时构建 zip 文件(可能只是在内存中)。
I would have thought the plan would be that the zip file ends up on the client rather than staying on the server.
Without knowing about your architecture, I would guess at an approach like this:
This relies on being able to rerun the report to generate the zip file, of course. You could generate a zip file each time you generate some HTML, but that's wasteful if you don't need to do it, and requires clean-up etc.
Perhaps I've misunderstood you though... if this doesn't sound appropriate, could you update your question?
EDIT: Okay, having seen the update to your question, I'd be tempted to store the files for each report in a separate directory (e.g. using a GUID as the directory name). Many file systems support compression at the file system level, so "premature zipping" probably wouldn't save much disk space, and would make extracting individual files harder. Then if the user requests a zip, you just need to build the zip file at that point, probably just in memory, before serving it.
这很能说明问题 - 这意味着报告是可共享的,并且您还希望“缓存”报告,以便不必重新生成。
实现此目的的一种方法是找出一种将参数散列在一起的方法,这样不同的参数组合(导致不同的报告)散列为不同的值。 然后,您可以使用这些散列作为密钥存储在磁盘中的 zip 格式的大型报告缓存中(可能文件的名称就是散列?)
这样,每次有人请求报告时,您都会对参数进行散列,并且检查该报告是否已生成,并以 zip 下载的形式提供该报告,或者您可以将其解压缩,并按正常方式提供 html。 如果报告不存在,则生成它并压缩它,确保稍后能够将其识别为由这些参数生成的(即记录哈希值)。
需要注意的一件事是,文件系统写入往往是非原子的,因此如果您不小心,您将重新生成报告两次,这很糟糕,但幸运的是,在您的情况下,不会太有害。 为了避免这种情况,您可以使用单个线程来完成此操作(速度较慢),或者实现某种锁。
that is quite telling - it means that the reports are sharable, and you also would like to "cache" reports so that it doesnt have to be regenerated.
one way to do this would be to work out a way to hash the parameters together, in such a way that different parameter combinations (that result in different a report) hash to different values. then, you can use those hash as a key into a large cache of reports stored in disk in zip (may be the name of the file is the hash?)
that way, every time someone requests a report, you hash the parameters, and check if that report was already generated, and serve that up, either as a zip download, or, you can unzip it, and serve up the html as per normal. If the report doesnt exist, generate it, and zip it, make sure to be able to identify it later on as being produced by these parameters (i.e., record the hash).
one thing to be careful of is that file system writes tends to be non-atomic, so if you are not careful, you will regenerate the report twice, which sucks, but luckily in your case, not too harmful. to avoid, you can use a single thread to do it (slower), or implement some kind of lock.
您不需要在文件系统上物理创建 zip 文件。 在内存中创建 zip 并没有什么问题,将其流式传输到浏览器并让 GC 负责释放临时 zip 占用的内存。 这当然会带来问题,因为每次发出请求时持续重新创建 zip 可能效率低下。 不过根据你的需要等等来判断这些事情。
You dont need to physically create zip files on a file system. Theres nothing wrong with creating the zips in memory, stream it to the browser and let GC take care of releasing the memory taken by the temporary zip. This of course introduces problems as it could be potentially ineffecient to continnally recreate the zip each time a request is made. However judge these things according to your needs and so on.