如何解析/解压/解压 Nexus 生成的 Maven 存储库索引
我已经从 http://mirrors.ibiblio.org/pub/mirrors/maven2/dot-index/nexus-maven-repository-index.gz
我想列出这些索引文件中的工件信息(groupId、artifactId ,例如版本)。我读到有一个高级 API 可以实现这一点。看来我必须使用以下 Maven 依赖项。但是,我不知道要使用的入口点是什么(哪个类?)以及如何使用它来访问这些文件:
<dependency>
<groupId>org.sonatype.nexus</groupId>
<artifactId>nexus-indexer</artifactId>
<version>3.0.4</version>
</dependency>
I have downloaded the indexes generated for Maven Central from http://mirrors.ibiblio.org/pub/mirrors/maven2/dot-index/nexus-maven-repository-index.gz
I would like to list the artifacts information from these index files (groupId, artifactId, version for example). I have read that there is a high level API for that. It seems that I have to use the following maven dependency. However, I don't know what is the entry point to use (which class?) and how to use it to access those files:
<dependency>
<groupId>org.sonatype.nexus</groupId>
<artifactId>nexus-indexer</artifactId>
<version>3.0.4</version>
</dependency>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
看一下 https://github.com/cstamas/maven-indexer-examples项目。
简而言之:您不需要手动下载 GZ/ZIP(新/旧格式),它将由索引器为您完成(此外,如果可能的话,它也会为您处理增量更新)。
GZ 是“新”格式,独立于仅包含数据的 Lucene 索引格式(因此,独立于 Lucene 版本),而 ZIP 是“旧”格式,实际上是压缩的普通 Lucene 2.4.x 索引。目前没有发生数据内容更改,但计划在未来进行更改。
正如我所说,两者之间没有数据内容差异,但某些字段(就像您注意到的那样)已建立索引但未存储在索引中,因此,如果您使用 ZIP 格式,则它们将是可搜索的,但不可检索。
Take a peek at https://github.com/cstamas/maven-indexer-examples project.
In short: you dont need to download the GZ/ZIP (new/legacy format) manually, it will indexer take care of doing it for you (moreover, it will handle incremental updates for you too, if possible).
GZ is the "new" format, independent of Lucene index-format (hence, independent of Lucene version) containing data only, while the ZIP is "old" format, which is actually plain Lucene 2.4.x index zipped up. No data content change happens currently, but is planned in future.
As I said, there is no data content difference between two, but some fields (like you noticed) are Indexed but not stored on index, hence, if you consume the ZIP format, you will have them searchable, but not retrievable.
https://github.com/cstamas/maven-indexer-examples 已过时。并且构建失败(测试未通过)。
Nexus Indexer 已向前发展并包含了以下示例:
https://github.com/apache/maven-indexer/tree/master /indexer-examples
构建完成,代码可以运行。
如果您想推出自己的版本,这里有一个简化版本:
Maven:
Java:
我们在 Windup 项目中使用它 - JBoss迁移工具。
The https://github.com/cstamas/maven-indexer-examples is obsolete. And the build fails (tests do not pass).
The Nexus Indexer has moved along and included the examples too:
https://github.com/apache/maven-indexer/tree/master/indexer-examples
That builds, and the code works.
Here is a simplified version if you want to roll your own:
Maven:
Java:
We use this in the Windup project - JBoss migration tool.
旧版邮政编码是一个简单的 lucene 索引。我可以用 Luke 打开它
并编写一些简单的 lucene 代码来转储感兴趣的标头(在本例中为“u”)
示例输出...
不过可能有更好的方法来实现这一点...
The legacy zip index is a simple lucene index. I was able to open it with Luke
and write some simple lucene code to dump out the headers of interest ("u" in this case)
Sample output ...
There may be better ways to achieve this though ...
作为记录,现在有一个工具可以提取 Maven 索引并将其导出为文本文件: Maven 索引出口商。它可以作为 Docker 映像提供,并且不需要任何代码。
它基本上下载所有 .gz 索引文件,使用 maven-indexer cli 提取索引并导出将它们保存到带有 clue 的文本文件中。它已经在 Maven Central 上进行了测试,并且适用于许多其他 Maven 存储库。
For the records, there is now a tool to extract and export maven indexes as text files: the Maven index exporter. It's available as a Docker image and no code is required.
It basically downloads all .gz index files, extracts the indexes using maven-indexer cli and exports them to a text file with clue. It has been tested on Maven Central and works on many other Maven repositories.