如何解析/解压/解压 Nexus 生成的 Maven 存储库索引

发布于 2024-11-03 14:38:31 字数 601 浏览 9 评论 0原文

我已经从 http://mirrors.ibiblio.org/pub/mirrors/maven2/dot-index/nexus-maven-repository-index.gz

我想列出这些索引文件中的工件信息（groupId、artifactId ，例如版本）。我读到有一个高级 API 可以实现这一点。看来我必须使用以下 Maven 依赖项。但是，我不知道要使用的入口点是什么（哪个类？）以及如何使用它来访问这些文件：

<dependency>
    <groupId>org.sonatype.nexus</groupId>
    <artifactId>nexus-indexer</artifactId>
    <version>3.0.4</version>
</dependency>

原文

I have downloaded the indexes generated for Maven Central from http://mirrors.ibiblio.org/pub/mirrors/maven2/dot-index/nexus-maven-repository-index.gz

I would like to list the artifacts information from these index files (groupId, artifactId, version for example). I have read that there is a high level API for that. It seems that I have to use the following maven dependency. However, I don't know what is the entry point to use (which class?) and how to use it to access those files:

<dependency>
    <groupId>org.sonatype.nexus</groupId>
    <artifactId>nexus-indexer</artifactId>
    <version>3.0.4</version>
</dependency>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

薄荷梦 2024-11-10 14:38:31

看一下 https://github.com/cstamas/maven-indexer-examples项目。

简而言之：您不需要手动下载 GZ/ZIP（新/旧格式），它将由索引器为您完成（此外，如果可能的话，它也会为您处理增量更新）。

GZ 是“新”格式，独立于仅包含数据的 Lucene 索引格式（因此，独立于 Lucene 版本），而 ZIP 是“旧”格式，实际上是压缩的普通 Lucene 2.4.x 索引。目前没有发生数据内容更改，但计划在未来进行更改。

正如我所说，两者之间没有数据内容差异，但某些字段（就像您注意到的那样）已建立索引但未存储在索引中，因此，如果您使用 ZIP 格式，则它们将是可搜索的，但不可检索。

回复收藏 0 原文

凹づ凸ル 2024-11-10 14:38:31

https://github.com/cstamas/maven-indexer-examples 已过时。并且构建失败（测试未通过）。

Nexus Indexer 已向前发展并包含了以下示例：
https://github.com/apache/maven-indexer/tree/master /indexer-examples

构建完成，代码可以运行。

如果您想推出自己的版本，这里有一个简化版本：

Maven：

<dependencies>
    <dependency>
        <groupId>org.apache.maven.indexer</groupId>
        <artifactId>indexer-core</artifactId>
        <version>6.0-SNAPSHOT</version>
        <scope>compile</scope>
    </dependency>

    <!-- For ResourceFetcher implementation, if used -->
    <dependency>
        <groupId>org.apache.maven.wagon</groupId>
        <artifactId>wagon-http-lightweight</artifactId>
        <version>2.3</version>
        <scope>compile</scope>
    </dependency>

    <!-- Runtime: DI, but using Plexus Shim as we use Wagon -->
    <dependency>
        <groupId>org.eclipse.sisu</groupId>
        <artifactId>org.eclipse.sisu.plexus</artifactId>
        <version>0.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.sonatype.sisu</groupId>
        <artifactId>sisu-guice</artifactId>
        <version>3.2.4</version>
    </dependency>

Java：

public IndexToGavMappingConverter(File dataDir, String id, String url)
    throws PlexusContainerException, ComponentLookupException, IOException
{
    this.dataDir = dataDir;

    // Create Plexus container, the Maven default IoC container.
    final DefaultContainerConfiguration config = new DefaultContainerConfiguration();
    config.setClassPathScanning( PlexusConstants.SCANNING_INDEX );
    this.plexusContainer = new DefaultPlexusContainer(config);

    // Lookup the indexer components from plexus.
    this.indexer = plexusContainer.lookup( Indexer.class );
    this.indexUpdater = plexusContainer.lookup( IndexUpdater.class );
    // Lookup wagon used to remotely fetch index.
    this.httpWagon = plexusContainer.lookup( Wagon.class, "http" );

    // Files where local cache is (if any) and Lucene Index should be located
    this.centralLocalCache = new File( this.dataDir, id + "-cache" );
    this.centralIndexDir = new File( this.dataDir,   id + "-index" );

    // Creators we want to use (search for fields it defines).
    // See https://maven.apache.org/maven-indexer/indexer-core/apidocs/index.html?constant-values.html
    List<IndexCreator> indexers = new ArrayList();
    // https://maven.apache.org/maven-indexer/apidocs/org/apache/maven/index/creator/MinimalArtifactInfoIndexCreator.html
    indexers.add( plexusContainer.lookup( IndexCreator.class, "min" ) );
    // https://maven.apache.org/maven-indexer/apidocs/org/apache/maven/index/creator/JarFileContentsIndexCreator.html
    //indexers.add( plexusContainer.lookup( IndexCreator.class, "jarContent" ) );
    // https://maven.apache.org/maven-indexer/apidocs/org/apache/maven/index/creator/MavenPluginArtifactInfoIndexCreator.html
    //indexers.add( plexusContainer.lookup( IndexCreator.class, "maven-plugin" ) );

    // Create context for central repository index.
    this.centralContext = this.indexer.createIndexingContext(
            id + "Context", id, this.centralLocalCache, this.centralIndexDir,
            url, null, true, true, indexers );
}


    final IndexSearcher searcher = this.centralContext.acquireIndexSearcher();
    try
    {
        final IndexReader ir = searcher.getIndexReader();
        Bits liveDocs = MultiFields.getLiveDocs(ir);
        for ( int i = 0; i < ir.maxDoc(); i++ )
        {
            if ( liveDocs == null || liveDocs.get( i ) )
            {
                final Document doc = ir.document( i );
                final ArtifactInfo ai = IndexUtils.constructArtifactInfo( doc, this.centralContext );

                if (ai == null)
                    continue;
                if (ai.getSha1() == null)
                    continue;
                if (ai.getSha1().length() != 40)
                    continue;
                if ("javadoc".equals(ai.getClassifier()))
                    continue;
                if ("sources".equals(ai.getClassifier()))
                    continue;

                out.append(StringUtils.lowerCase(ai.getSha1())).append(' ');
                out.append(ai.getGroupId()).append(":");
                out.append(ai.getArtifactId()).append(":");
                out.append(ai.getVersion()).append(":");
                out.append(StringUtils.defaultString(ai.getClassifier()));
                out.append('\n');
            }
        }
    }
    finally
    {
        this.centralContext.releaseIndexSearcher( searcher );
    }

我们在 Windup 项目中使用它 - JBoss迁移工具。

The https://github.com/cstamas/maven-indexer-examples is obsolete. And the build fails (tests do not pass).

The Nexus Indexer has moved along and included the examples too:
https://github.com/apache/maven-indexer/tree/master/indexer-examples

That builds, and the code works.

Here is a simplified version if you want to roll your own:

Maven:

<dependencies>
    <dependency>
        <groupId>org.apache.maven.indexer</groupId>
        <artifactId>indexer-core</artifactId>
        <version>6.0-SNAPSHOT</version>
        <scope>compile</scope>
    </dependency>

    <!-- For ResourceFetcher implementation, if used -->
    <dependency>
        <groupId>org.apache.maven.wagon</groupId>
        <artifactId>wagon-http-lightweight</artifactId>
        <version>2.3</version>
        <scope>compile</scope>
    </dependency>

    <!-- Runtime: DI, but using Plexus Shim as we use Wagon -->
    <dependency>
        <groupId>org.eclipse.sisu</groupId>
        <artifactId>org.eclipse.sisu.plexus</artifactId>
        <version>0.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.sonatype.sisu</groupId>
        <artifactId>sisu-guice</artifactId>
        <version>3.2.4</version>
    </dependency>

Java:

public IndexToGavMappingConverter(File dataDir, String id, String url)
    throws PlexusContainerException, ComponentLookupException, IOException
{
    this.dataDir = dataDir;

    // Create Plexus container, the Maven default IoC container.
    final DefaultContainerConfiguration config = new DefaultContainerConfiguration();
    config.setClassPathScanning( PlexusConstants.SCANNING_INDEX );
    this.plexusContainer = new DefaultPlexusContainer(config);

    // Lookup the indexer components from plexus.
    this.indexer = plexusContainer.lookup( Indexer.class );
    this.indexUpdater = plexusContainer.lookup( IndexUpdater.class );
    // Lookup wagon used to remotely fetch index.
    this.httpWagon = plexusContainer.lookup( Wagon.class, "http" );

    // Files where local cache is (if any) and Lucene Index should be located
    this.centralLocalCache = new File( this.dataDir, id + "-cache" );
    this.centralIndexDir = new File( this.dataDir,   id + "-index" );

    // Creators we want to use (search for fields it defines).
    // See https://maven.apache.org/maven-indexer/indexer-core/apidocs/index.html?constant-values.html
    List<IndexCreator> indexers = new ArrayList();
    // https://maven.apache.org/maven-indexer/apidocs/org/apache/maven/index/creator/MinimalArtifactInfoIndexCreator.html
    indexers.add( plexusContainer.lookup( IndexCreator.class, "min" ) );
    // https://maven.apache.org/maven-indexer/apidocs/org/apache/maven/index/creator/JarFileContentsIndexCreator.html
    //indexers.add( plexusContainer.lookup( IndexCreator.class, "jarContent" ) );
    // https://maven.apache.org/maven-indexer/apidocs/org/apache/maven/index/creator/MavenPluginArtifactInfoIndexCreator.html
    //indexers.add( plexusContainer.lookup( IndexCreator.class, "maven-plugin" ) );

    // Create context for central repository index.
    this.centralContext = this.indexer.createIndexingContext(
            id + "Context", id, this.centralLocalCache, this.centralIndexDir,
            url, null, true, true, indexers );
}


    final IndexSearcher searcher = this.centralContext.acquireIndexSearcher();
    try
    {
        final IndexReader ir = searcher.getIndexReader();
        Bits liveDocs = MultiFields.getLiveDocs(ir);
        for ( int i = 0; i < ir.maxDoc(); i++ )
        {
            if ( liveDocs == null || liveDocs.get( i ) )
            {
                final Document doc = ir.document( i );
                final ArtifactInfo ai = IndexUtils.constructArtifactInfo( doc, this.centralContext );

                if (ai == null)
                    continue;
                if (ai.getSha1() == null)
                    continue;
                if (ai.getSha1().length() != 40)
                    continue;
                if ("javadoc".equals(ai.getClassifier()))
                    continue;
                if ("sources".equals(ai.getClassifier()))
                    continue;

                out.append(StringUtils.lowerCase(ai.getSha1())).append(' ');
                out.append(ai.getGroupId()).append(":");
                out.append(ai.getArtifactId()).append(":");
                out.append(ai.getVersion()).append(":");
                out.append(StringUtils.defaultString(ai.getClassifier()));
                out.append('\n');
            }
        }
    }
    finally
    {
        this.centralContext.releaseIndexSearcher( searcher );
    }

We use this in the Windup project - JBoss migration tool.

回复收藏 0 原文

西瑶 2024-11-10 14:38:31

旧版邮政编码是一个简单的 lucene 索引。我可以用 Luke 打开它
并编写一些简单的 lucene 代码来转储感兴趣的标头（在本例中为“u”）

import org.apache.lucene.document.Document;
import org.apache.lucene.search.IndexSearcher;

public class Dumper {
    public static void main(String[] args) throws Exception {
        IndexSearcher searcher = new IndexSearcher("c:/PROJECTS/Test/index");
        for (int i = 0; i < searcher.maxDoc(); i++) {
            Document doc = searcher.doc(i);
            String metadata = doc.get("u");
            if (metadata != null) {
                System.out.println(metadata);
            }
        }
    }
}

示例输出...

org.ioke|ioke-lang-lib|P-0.4.0-p11|NA
org.jboss.weld.archetypes|jboss-javaee6-webapp|1.0.1.CR2|sources|jar
org.jboss.weld.archetypes|jboss-javaee6-webapp|1.0.1.CR2|NA
org.nutz|nutz|1.b.37|javadoc|jar
org.nutz|nutz|1.b.37|sources|jar
org.nutz|nutz|1.b.37|NA
org.openengsb.wrapped|com.google.gdata|1.41.5.w1|NA
org.openengsb.wrapped|openengsb-wrapped-parent|6|NA

不过可能有更好的方法来实现这一点...

The legacy zip index is a simple lucene index. I was able to open it with Luke
and write some simple lucene code to dump out the headers of interest ("u" in this case)

import org.apache.lucene.document.Document;
import org.apache.lucene.search.IndexSearcher;

public class Dumper {
    public static void main(String[] args) throws Exception {
        IndexSearcher searcher = new IndexSearcher("c:/PROJECTS/Test/index");
        for (int i = 0; i < searcher.maxDoc(); i++) {
            Document doc = searcher.doc(i);
            String metadata = doc.get("u");
            if (metadata != null) {
                System.out.println(metadata);
            }
        }
    }
}

Sample output ...

org.ioke|ioke-lang-lib|P-0.4.0-p11|NA
org.jboss.weld.archetypes|jboss-javaee6-webapp|1.0.1.CR2|sources|jar
org.jboss.weld.archetypes|jboss-javaee6-webapp|1.0.1.CR2|NA
org.nutz|nutz|1.b.37|javadoc|jar
org.nutz|nutz|1.b.37|sources|jar
org.nutz|nutz|1.b.37|NA
org.openengsb.wrapped|com.google.gdata|1.41.5.w1|NA
org.openengsb.wrapped|openengsb-wrapped-parent|6|NA

There may be better ways to achieve this though ...

回复收藏 0 原文