Xalan XSLT - 内存堆空间不足
我的项目有一个报告模块,它以 XML 的形式从数据库收集数据,并在其上运行 XSLT 以生成用户所需的报告格式。此时的选项有 HTML 和 CSV。
我们使用 Java 和 Xalan 来完成与数据的所有交互。
不好的是,用户可以请求的这些报告之一仅 XML 部分就有 143MB(大约 430,000 条记录)。当将其转换为 HTML 时,我用完了堆空间,最多为堆保留了 4096G。这是不可接受的。
看起来问题只是数据太多了,但我不禁想到有比限制客户和无法满足功能需求更好的方法来处理这个问题。
我很高兴根据需要提供更多信息,但我不能透露太多有关该项目的信息,因为我相信你们大多数人都明白。另外,答案是肯定的;我同时需要所有数据:我无法对其进行分页。
谢谢
编辑
我使用的所有转换类都在 javax.xml.transform 包中。实现看起来像这样:
final Transformer transformer =
TransformerFactory.newInstance().newTransformer(
new StreamSource(new StringReader(xsl)));
final StringWriter outWriter = new StringWriter();
transformer.transform(
new StreamSource(new StringReader(xml)), new StreamResult(outWriter));
return outWriter.toString();
如果可能的话,我想保留 XSLT 的原样。 StreamSource 的处理方法应该允许我在处理数据时 GC 一些数据,但我不确定 XSLT(函数等)可能需要什么限制才能正确执行清理。如果有人可以向我指出详细说明这些限制的资源,那将会非常有帮助。
My project has a reporting module that gathers data from the database in the form of XML and runs an XSLT on it to generate the user's desired format of report. Options at this point are HTML and CSV.
We use Java and Xalan to do all interaction with the data.
The bad part is that one of these reports that the user can request is 143MB (about 430,000 records) for just the XML portion. When this is transformed into HTML, I run out of heap space with a maximum of 4096G reserved for heap. This is unacceptable.
It seems that the problem is simply too much data, but I can't help but think there is a better way to deal with this than limiting the customer and not being able to meet functional requirements.
I am glad to give more information as needed, but I cannot disclose too much about the project as I'm sure most of you understand. Also, the answer is yes; I need all of the data at the same time: I cannot paginate it.
Thanks
EDIT
All the transformation classes I am using are in the javax.xml.transform package. The implementation looks like this:
final Transformer transformer =
TransformerFactory.newInstance().newTransformer(
new StreamSource(new StringReader(xsl)));
final StringWriter outWriter = new StringWriter();
transformer.transform(
new StreamSource(new StringReader(xml)), new StreamResult(outWriter));
return outWriter.toString();
If possible, I would like to leave the XSLT the way it is. The StreamSource
method of doing things should allow me to GC some of the data as it is processed, but I'm not sure what limitations on XSLT (functions, etc) this might require for it to do proper cleanup. If someone could point me at a resource detailing those limitations, it would be very helpful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
XSLT 的问题是,在进行转换时,您需要在内存中拥有整个源文档(以及结果文档)的 DOM 表示形式。对于大型 XML 文件来说,这是一个严重的问题。
您对允许流式转换的系统感兴趣,其中完整文档不必重新存储在内存中。也许 STX 是一个选择:
http://www.xml.com/pub/a/2003 /02/26/stx.html
http://stx.sourceforge.net/。它与 XSLT 非常相似,因此如果您的 XSLT 样式表以直接的方式应用于 XML,那么将其重写为 STX 可能会非常简单。
The problem with XSLT is that you need to have a DOM representation of the whole source document (as well as the result document) in memory while doing the transformation. For large XML files this is a serious problem.
You are interested in a system that allows a streaming transformation where the full documents do not have to recide in memory. Maybe STX is an option:
http://www.xml.com/pub/a/2003/02/26/stx.html
http://stx.sourceforge.net/. It is quite similar to XSLT, so if your XSLT stylesheet is applied to the XML in a straight-forward manner, rewriting it to STX could be quite simple.
我们可以通过做两件事来改进这一点。
我们采用 XML 源和目标格式并将它们制作为临时文件。这使得初始创建和存储不占用 RAM,因为数据来自数据库并且也被写回数据库。只需要一个数据句柄即可。
使用Saxonica 变压器。这允许进行一些操作,包括 SAX 样式转换和 XSLT 2.0 的使用,而 Xalan 解析器则不允许。
We are able to improve this by doing two things.
We take the XML source and destination format and make them files in temp. This keeps the initial creation and storage out of RAM, since the data is coming from a database and being written back to the DB as well. A handle to the data is all that's necessary.
Use the Saxonica transformer. This allows for a couple things including SAX-style transformations and the use of XSLT 2.0, which the Xalan parser does not.