使用maven项目打包大型频繁变化的数据文件
我正在将旧版 Ant 项目转换为 Maven 项目。该项目的一部分是一组非常大(~1.6GB)的压缩二进制格式的数据文件,可以通过索引表以随机查找的方式进行访问。数据文件就像对数函数表、彩虹表或类似的数据表,用于大量简化复杂的计算。
我们每周发布新的数据表,我希望能够利用 Maven 的依赖管理系统来帮助开发人员获取最新的表。
我遇到的主要问题是我无法弄清楚如何以一种不仅仅是整组表的 JAR、ZIP 或 RAR 的方式捆绑这些表。有没有办法编写一个 pom 来生成数据文件目录?或者我只是以非 Maven 的方式思考这个问题?
感谢您的任何建议。
I am in the process of converting a legacy Ant project into a Maven project. Part of the project is a very large (~1.6GB) set of data files in a compressed binary format which are accessed in a random-seek fashion via index tables. The data files are like logarithmic function tables, rainbow tables or similar data tables for massively abbreviating complex computations.
We publish new data tables on a weekly basis, and I want to be able to exploit Maven's dependency management system to help the developers get the latest tables.
The main problem I am having is that I cannot figure out how to bundle the tables up in a way that isn't just a JAR, ZIP or RAR of the whole set of them. Is there a way to write a pom that will result in a directory of data files? Or am I just thinking about the problem in a non-Maven way?
Thanks for any suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这取决于消费者能够处理什么。 Maven 依赖项不处理文件目录,因此您需要整个工件。您可能想要处理 ZIP,因为 JAR 具有重载含义(放在类路径上),而其他压缩需要自定义插件。
但是,如果您可以将其分解为长期和短期数据,您可能会获得更好的行为(例如,每季度完整发布,以及应用于每周重新发布的一组更改)。这取决于数据是否可以轻松地以这种方式分割、覆盖或以某种方式修补。这在压缩的二进制工件中可能很困难。
另一种选择是不断构建大型工件,并丢弃旧的。这依赖于构建和存储库之间良好的带宽,以及足够的磁盘来容纳您需要的尽可能多的构建(如果合适的话,像 Archiva 这样的存储库管理器可以帮助定期清除旧的构建)。
最后一点 - 如果您正在处理超过 2G 的 ZIP(您已经接近),您将需要使用不同的 ZIP,例如 truezip-maven-plugin。
This depends on what the consumer can deal with. Maven dependencies don't deal with directories of files, so you'd need the whole artifact. You probably want to deal with ZIPs, as JAR has an overloaded meaning (put on the classpath) and other compression need custom plugins.
However, if you can break it up into long-lived and short-lived data you may get better behaviour (e.g. a quarterly full release, and a set of changes to apply to that that is re-released weekly). This depends whether the data can easily be split in this fashion, or overlaid, or patched in some way. This might be difficult in a compressed binary artifact.
The other alternative is to continuously build the large artifact, and discard old ones. This relies on good bandwidth between builds and repository, and enough disk to hold as many builds as you need (repository managers like Archiva can help purge old builds on a regular schedule if that's appropriate).
One final note - if you are dealing with ZIPs over 2G (which you are close to approaching), you will need to use a different ZIP such as the truezip-maven-plugin.