用 Java 创建和保存大型 XML
我正在开发一个java应用程序,其工作是创建和保存XML(大尺寸)。我得到的样本是 300 MB 的 XML 文件。
该应用程序旨在从数据库收集大量数据并将其保存为 XML 格式。该应用程序因其大量 IO 和内存使用而被设计为并行处理 MAX 3 此类请求。
现在的要求是让它并行处理最多 50 个这样的请求。当前应用程序使用 XMLbean 创建 XML,然后将其保存到文件系统。该应用程序在 weblogic 服务器上作为 Web 服务公开(它位于 64 位操作系统和 Java 最大堆大小 id 4 GB 上)。
我需要您对以下问题的意见:
1) 是否有可与 XSD 配合使用的 XML API,并可用于以最小的开销创建 200-200 MB 的大型 XML? XMLbean 对我们来说工作得很好,但是有什么东西可以更好地处理它吗?
2)将其保存到文件系统的最佳且最节省内存的方法是什么? - 我正在考虑将当前 writer 更改为 bufferedWriter 并让它在物理写入磁盘之前将 1024 字节保存到内存中。 - 增加它会有副作用吗?
3)如果技术选择和服务器等没有限制 - 什么将是理想的解决方案!
编辑 1# 数据库访问速度很快(大约占总时间的 5%)。 XML 的创建速度很慢(需要 80%)的时间。节省需要 15%(但我认为我可以做很多改进,所以我不担心这一点)。 - 谢谢路易斯。
I am working on a java application whose job is to create and save XML (large size). The sample i got is 300 MB XML file.
The app was designed to collect bulk data from the database and save it in XML format. The application because of its heavy IO and memory usage was designed to process MAX 3 such requests parallel.
Now the requirement is to make it process up to 50 such requests parallel. The current app uses XMLbean to create the XML then saves it to the file system. The application is exposed as a web service on weblogic server (it's on a 64 bit OS and Java MAX Heap size id 4 GB).
I need your opinion on:
1) Is there an XML API that works with XSD and can be used to create large XMLs 200-200 MB with minimum overhead ? XMLbean works fine for us, but is there something that can handle it better ?
2) What will be the best and most memory efficient way to save it to file system ? - i am thinking of changing the current writer to bufferedWriter and have it save 1024 bytes to memory before a physical write to disk happens. - Can there be any side effect to increasing it ?
3) If there is no limit on technology choice and server etc - what will be the ideal solution !!!
EDIT 1# The DB access is fast (about 5% of total time). The creating of XML is slow (takes 80%) of time. Saving it takes 15% (but there are a lot of improvements i see i can do so i am not worried about that). - Thanks Luis.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我有一个类似的问题。服务器正在使用 JDOM 将数据写入 XML 文件中。多年来,这些数据变得越来越大,服务器变得越来越慢,并且使用的内存也越来越大。原因如下:
服务器将数据累积在大哈希表和列表中。在工作结束时,他在内存中创建了带有 JDOM 的 XML 文档,然后将其写入磁盘。
我将 XML 写入更改为使用带有 XMLStreamWriter
唯一的问题是,编写的 xml 文件不是很漂亮。这可以通过 IndentingXMLStreamWriter
代码示例如下:
I had a similiar problem. A server was writing data with JDOM in XML files. Over the years this data was getting bigger, and the server was getting slower and the memory used was huge. The reason for this was the following:
The server accumulated the data in big hashtables and list. At the end of a job he created the XML Document with JDOM in memory and than wrotes it to the disk.
I changed the XML writing to use a stream approach with a XMLStreamWriter
The only problem was, that the written xml file was not very pretty. This could be solved with a IndentingXMLStreamWriter
A code example would be:
我会考虑使用 StAX 等流式 XML API,以避免在将整个 XML 文档写入磁盘之前将其保存在内存中。这样,内存占用量可以保持较低(不需要 50 倍的 XML 大小来并行处理 50 个文档)...
请参阅 为什么选择 StAX? (Oracle) 了解更多信息。
I would look into using the streaming XML APIs such as StAX to avoid having to hold the whole XML document in memory before writing it out to disk. That way the memory footprint can be kept low (not needed 50x the size of the XML to process 50 documents in parallel)...
See Why StAX? (Oracle) for more info.